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^ ABSTRACT 

12/04/2011 Using the complete GAMA-I survey covering ~ 142 deg^ to tab = 19.4, of which 

~ 47 deg^ is to tab = 19.8, we create the GAMA-I galaxy group catalogue (G^Cvl), 
generated using a friends-of-friends (FoF) based grouping algorithm. Our algorithm 
has been tested extensively on one family of mock GAMA lightcones, constructed from 
ACDM N-body simulations populated with semi-analytic galaxies. Recovered group 
properties are robust to the effects of interlopers and are median unbiased in the most 
important respects. G^Cvl contains 14,388 galaxy groups (with multiplicity ^ 2), in- 
cluding 44,186 galaxies out of a possible 110,192 galaxies, implying ~40% of all galaxies 
are assigned to a group. The similarities of the mock group catalogues and G^Cvl are 
multiple: global characteristics are in general well recovered. However, we do find a 
noticeable deficit in the number of high multiplicity groups in GAMA compared to 
the mocks. Additionally, despite exceptionally good local spatial completeness, G^Cvl 
contains significantly fewer compact groups with 5 or more members, this effect be- 
coming most evident for high multiplicity systems. These two differences are most 
likely due to limitations in the physics included of the current GAMA lightcone mock. 
Further studies using a variety of galaxy formation models are required to confirm 
their exact origin. The G^Cvl catalogue will be made publicly available as and when 
the relevant GAMA redshifts are made available at http : //www . gama-s urvey . org^ 
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1 INTRODUCTION 

Galaxy group and cluster catalogues have a long his- 
tory in astronomy. Early attempts at creating asso- 
ciations _ofgalaxjfs were quite qualit ative in nature 
(e.g. Abelll Il958l : IZwickv et cH Il96lh . but more re- 
cently significant effort has been devoted to robustly de- 



tecting grouped structures ( e.g. Huc hra Geller l 1 19821: 



i Moore. Frenk k Whitj Il993l: lEke et al . 2004; Gerk e et al 
^2005: Yang et al.' "2005: 'Berh nd et al.1 2006: Broug h et al 
,2006; Knobel et al. 2009). The pi oneer ing application of this 
pr ocess was bvlHuchra Gelle3 (|l982l l , where the catalogue 
of iDe Vaucouleur d(ll975D, the earliest reasonably complete 
attempt at a group catalogue, was reconstructed using fully 
quantitative means — i.e. by a method that was reproducible 
and not subjective. 

The power of group catalogues resides in their rela- 
tion to the theoretically motivated dark matter haloes. 
ACDM, the literatures current favoured structure formation 
paradigm, makes very strong predictions about the self sim- 
ilar hierarchical merging process tha t occurs between haloes 
of dark matter (jSpringel et al. I I2OO5I ). Galaxy groups are the 
observable equivalent of dark matter haloes, and thus offer a 
direct insight into the physics that has occurred in the dark 
matter haloes in the Universe up to the present day. Fur- 
ther to offering a route to studying dark matt er dynamics 
(e.g. IPlionis et al.ll2006l : iRobotham et al.ll2008l \ analysis of 
galaxy groups opens the way to understanding how galax- 
ies po pulate haloes (e.g. Cooray & Sheth 2002; Yang et al. 
hooi : ICooravll2006l : iRobotham et al.ll2006l . l2010br ). 

The strongest differentials between competing physical 
models of dark matter are found at the extremes of the halo 
ma ss function (HMF), i.e. on cluster scales fe.g. lEke et al.l 
[1996 ) and on low mass scales. The HMF describes the co- 
moving number density distribution of dark matter haloes 
as a function of halo mass. Low mass groups are highly sen- 
sitive to the temperature of the CDM. We either expect to 
see a continuation of the near power- law pre diction for the 
HMF down to Local Group mass haloes (see Ijenkins et al.l 
I2OOIL and references therein) for a cold dark matter Uni- 
verse, or, as the dark matter becomes warmer, the slope 
should become suppressed significantly. 

The Galaxy and Mass Assembly project (GAMA) is 
a major new multi - wavelength spectroscopic galaxy survey 
([Driver et al.l fioill ) . The final redshift survey will contain 
-400,000 redshifts to tab = 19.8 over 360 deg^ with a 
survey design aimed at providing an exceptionally uniform 
spatial completeness ([Robotham et al.l l2010a[ : iBaldrv et al.l 
I2OIOI : iDriver et al ."201lh. One of the principal science goals 
of GAMA is to make a statistically significant analysis of 
low mass groups (M ^ 10^^ M0), helping to constrain 
the low mass regime of the dark matter HMF and galaxy 
formation efficiency in Local Group like haloes. 

As well as allowing us to determine galaxy group 
dynamics and composition at the highest fidelity possi- 
ble due to the increased redshift density, GAMA will 
also provide multi-band photometry spanning the UV 
(GALEX), visible (SDSS; VST-KIDS), near-IR (UKIDSS- 
LAS, VIKING), mid-IR (WISE), far-IR (ATLAS) and radio 
(GMRT, ASKAP). By combining a GAMA Galaxy Group 
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Catalogue (G^C) constructed with spatially near-complete 
redshifts and 21 band photometry, the GAMA project is 
in a unique position to answer many of the most pressing 
questions that exist in extra-galactic astronomy today. Cru- 
cially, the interplay between Star Formation Rate (SFR), 
stellar mass, morphology, QSO activity and Star Formation 
Efficiency (SEE) with environment can be probed in un- 
precedented detail. The group catalogue presented here will 
also enable galaxy evolution to be investigated as a function 
of halo mass, rather than with coarse environmental mark- 
ers, in statistically significant low mass regimes for the first 
time. This is a huge advance on the capabilities of current 
large spectroscopic surveys like SDSS and 2dFGRS that are 
almost single pass and hence suffer seriously from spectro- 
scopic incompleteness in clustered regions. GAMA, by being 
at least 6 pass in every unit of sky, is exceptionally complete 
on all angular scales (Robotham et al. 2010a; Driver e t al.l 
I2OIII). 

The catalogue and group analyses presented here is 
based on the first three years of spectroscopic observations 
(February 2008 to May 2010) made at the Anglo- Austrahan 
Telescope (AAT). Within the GAMA project, this period is 
referred to as GAMA-I, since the deeper, larger area, con- 
tinuation of the GAMA survey is commonly referred to as 
GAMA-II. 

The paper is organized as follows, ^describes the pre- 
cise FoF grouping algorithm, the GAMA data and the light- 
cone mocks used for the present analysis. The testing and 
grouping parameter optimisation using the mocks are con- 
sidered in ^ Group properties (i.e. velocity dispersion, ra- 
dius, dynamical mass and total luminosity) and their esti- 
mates are presented in 21 33 presents global group proper- 
ties for G'^C and corresponding mock group catalogues. A 
few GAMA group examples are discussed in ^ with conclu- 
sions presented in 33 We assume throughout an Qm = 0.25, 
Qa = 0.75, Ho = /ilOOkms"^ Mpc~^ cosmological model, 
corresponding to the cosmology of the Millennium N-body 
simulation used to construct the GAMA lightcone mocks. 



2 GALAXY GROUPING: ALGORITHM, DATA 
AND MOCKS 

There are many subtle differences in the specific algorithm 
used to construct groups from spectroscopic surveys, but 
the major dichotomy occurs at the scale of association con- 
sidered: galaxy-galaxy links or halo-galaxy links. Here we 
adopt galaxy-galaxy linking via a Friends-of-Friends (FoF) 
algorithm f ^2.1|) . having also explored a halo-galaxy group- 
ing and found it to be less successful at recovering small 
mass groups from our mock galaxy catalogues. The halo 
method implemente d was a variant of the Voronoi tessella- 
tion scheme used in iGerke et al ] (j2005), which worked rea- 
sonably well for larger groups and clusters, but was not com- 
petitive compared to our FoF implementation in the low halo 
mass regime. 



2.1 Friends of Friends 

A standard Friends-of-Friends algorithm creates links be- 
tween galaxies based on their separation as a measure of 
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X - Projected Comoving Distance 



Max radial sep for FoF group 

^ax projected sep for FoF group 

Implied FoF link 

? = In Real Halo 
? = Not in Real Halo 




X - Projected Comoving Distance 



Actual Halo Croup = 1,5,6 
Radial FoF Croup = 1 , 2, 5, 6, 7 
Projected FoF Croup = 1 , 3, 4, 5, 6 
Final Common FoF Croup =1,5,6 



Figure 1. Schematic of the two step process used when associat- 
ing galaxies via FoF algorithm on redshift survey data. The same 
set of galaxies are shown in two panels: along the line of sight 
(left) and projected on the sky (right). Both the radial and pro- 
jected separations are used to disentangle projection effects and 
recover the underlying group (galaxies 1, 5 and 6 in this exam- 
ple). The radial linking length has to be significantly larger than 
the projected one to properly account for peculiar velocities along 
the line of sight. 



the local density. In practice the projected and radial sep- 
arations are treated separately, due to significant line-of- 
sight effects from peculiar velocities within groups and clus- 
ters. The comoving radial separations within a group ap- 
pear larger than the projected ones, because radial distances 
inferred from galaxy redshifts contain peculiar velocity in- 
formation along the line of sight on top of their underly- 
ing Hubble distance away from the observer. Fig. [1] shows 
schematically how the radial and projected separations are 
used to detect a group. This shows that neither the radial 
nor the projected separation provide enough information to 
unambiguously detect a group, but their combination gen- 
erate a secure grouping. 



where Miim,i is the effective absolute magnitude limit of the 
survey at the position of galaxy i, (t){M) the survey galaxy 
luminosity function (LF). 

b is used to specify the overdensity with respect to the 
mean required to define a group. The approximate overden- 
sity contour that this linking would recover in a simulation 
(Universe) with equa l mass particles (galaxies) is given by 
p/p^ 3/(27r6^) (^C ole et al.» 1996). For a uniform spherical 
distribution of mass the virial radius corresponds to a mean 
overdensity of 178, hence the popularity of masses defined as 
being within 178 and 20 times the me an overdensity. For an 
NFW type profile (Nava rro. Frenk White 1991) the over- 
density within the virial radius is approximately 178/3 ^ 59. 
This implies an interparticle linking length of 6 :^ 0.2 in real 
space, corresponding to a volume overdensity — 125 be- 
tween galaxies. Linking together 1000s of dark matter parti- 
cles in a simulation with real-space coordinates is a relatively 
simple and robust process, extending this methodology to 
redshift-space using galaxies that trace the dark matter is 
non-trivial. Consequently, it is not simply true to state that 
h — 0.2 will return the virial mass limits for each galaxy 
group in the GAMA catalogue. Instead, h will be recovered 
from careful application to mock catalogues (see below for 
full details). Since there a subtle effects that vary the pre- 
cise b used on a galaxy by galaxy basis 6ij used above is the 
mean h for galaxy i and j respectively. In general, for near-by 
galaxies, b does not vary significantly. 

To this standard form of the mean comoving inter- 
galaxy separation at the position of galaxy i, we introduce 
an extra term, with Eq. [2]thus becoming: 



Ai 



lim,z 



-oo 



(j){M)dM 



-1/3 



(3) 



where Mgai,z is the absolute magnitude of galaxy i. This ex- 
tra term, ((/>(Miini,2)/0(^gai,O)^^^5 allows for larger linking 
distances for intrinsically brighter galaxies, provided v > 
and the LF is strictly increasing (which is true for GAMA). 
Adjusting v allows the algorithm to be more or less sensitive 
to the intrinsic brightness of a galaxy, and can be thought of 
as a softening power. The principle behind introducing this 
term is that associations should be more significant between 
brighter galaxies, and tests on mocks show that this gener- 
ates notably better quality group catalogues as determined 
from the cost function (see ^3.ip . 



2.1.1 Projected Unking condition 

In its simplest form we can say that two galaxies are asso- 
ciated in projection when the following condition is met: 

tan[6'i,2](L> com,l )/2^6i,j(Aim,l+Aim,2)/2, (1) 

where ^1,2 is the angular separation of the two galaxies, 
^com,i is the radial distance in comoving coordinates to 
galaxy i, the mean required linking overdensity and i^iim,i 
is the mean comoving inter-galaxy separation at the position 
of galaxy i, here defined as 



0(M)dM 



-1/3 



(2) 



2.1.2 Line-of-sight linking condition 

With Eq. [1] we have established an association in projec- 
tion, but we also require that a given pair of galaxies are 
associated along the line-of-sight or radially, i.e.: 



\Dc 



A, 



^ 6i?(Aim,l+ Aim,2)/2 



(4) 



where b is the linking length of Eq. \T\ Diim,i is given by 
Eq. [3] and R is the radial expansion factor to account for 
peculiar motions of galaxies within groups. With a redshift 
survey, the measured redshift contains both information on 
the Hubble flow redshift and any galaxy peculiar velocity 
along the line of sight. 
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2.1.3 Global linking conditions 



2.1.4 Completeness corrections 



To construct a group catalogue we link together all asso- 
ciations that meet our criteria given by Eq. [T] and Eq. O 
Galaxies that are not directly linked to each other can still be 
grouped together by virtue of common links between them. 
All possible groups are constructed in precisely this manner, 
leaving either completely ungrouped galaxies or galaxies in 
groups with 2 or more members. 

Despite its apparent simplicity, a FoF algorithm is 
still a very parametric approach to grouping. On top 
of the assumed cosmology, it requires the survey selec- 
tion function, and values for the linking parameters b 
and R. The ga laxy LP can be dir e ctly estimated from 
the data (e.R iLovedav et al.1 Il995l : iNorb erg et"aD I2OO2I : 
iBlanton et al.1 l2003l \ while the linking parameters cannot 
be estimated from the data. Instead they are commonly 
determined from either analytic calculations or analyses of 
N-body simulations populated with galaxies, with the lat- 
ter approach taken here (see ^2.31 for the description of the 
GAM A lightcone mocks). 

Merely using a static combination of b and R is less 
than optimal for accurately reconstructing groups in the 
mock data. An obvious shortcoming is that galaxies in clus- 
ters are significantly spread out along the line- of- sight, due 
to their large peculiar velocities a result of being bound to 
massive structures. To account for this we introduce a local 
environment measure that calculates the density contrast of 
a cylinder that is c entred on t he ga laxy of interest. Similar 
to the approach of lEke et al.l (|2004l ), we allow the b and R 
parameters to scale as a function of the observed density 
contrast, leading to position (r) and faint magnitude limit 
(miim) dependent linking parameters: 



&(r,miim) 
R{r, miim) 



bo 



Ro 



1 Pemp(r,miim) 

A p(r, miim) 

1 Pemp(r, miim) 

A p(r,miim) 



(5) 
(6) 



where p is the average local density implied by the selection 
function, pemp is the empirically estimated density, mum the 
apparent magnitude limit at position r and A is the den- 
sity contrast, an additional free parameter together with 
and ^R. For this work p is estimated from the galaxy se- 
lection function at r (i.e. it varies with the GAMA survey 
depth) . pemp is calculated directly from the number density 
within a comoving cylinder centred on r and of projected 
radius ta and radial extent I a- A determines the transi- 
tion between where the power scaling reduces or increases 
the linking lengths, so a galaxy within a local volume pre- 
cisely A times overdense will not have its links altered. The 
exact values for Eh, Eji and A are determined from the 
joint optimisation of the group cost function (see ^3.ip for 
all the parameters that affect the quality of the grouping 
when tested on the mocks. The parameters required for the 
FoF algorithm described above are now: bo, -Ro? A, ta? ^a? 
Eh, En and u. Whilst many parameters, bo and Ro are the 
dominant one for the grouping, the latter 6 merely deter- 
mining how best to modify the linking locally, and typically 
introducing minor perturbations to the grouping. 



Since the GAMA survey is highly complete (^98% within 
the r-band limits) the effect of incompleteness is minor, 
and tests on the mocks indicate the final catalogues are ex- 
tremely similar regardless of whether the linking length is 
adjusted based on the local completeness. A number of def- 
initions of local completeness were investigated: complete- 
ness within a pixel on a mask, completeness on a fixed 
angular top- hat scale around each galaxy and a complete- 
ness window function that represents the physical scale of 
a group on the sky. The difference between each was quite 
minor, but defining completeness on a physical scale pro- 
duced marginally better grouping costs f ^3.2p . Hence the 
completeness corrected linking parameter b at position r is 
given by: 



^comp(l*5 "^lin 



c(r) 



1/3 



(7) 



where c(r) is the redshift completeness within a projected 
comoving radius of 1.0 Mpc centred on r. The effect is to 
slightly increase the linking length to account for the loss of 
(possible) nearby galaxies that it could otherwise be linked 
with. Since GAMA was desig ned to be extremely com plete 
even at small angular scales ([Robotham et al ] l2010al l. the 
mean modifications are less than 1%. 



2.2 Data: GAMA survey 

Extensi ve details of the G AMA survey characteristics are 
given in lPriver et al.l(l2011), wit h the survey input catalogue 



3aldrv et al.l (120 10) and the spectroscopic tiling 
Robotham et al. (2010a). 



described in 
algorithm in 

Briefly, the GAMA-I survey covers three regions each 
12x4 degrees centred at 09h, 12h and 14h30m (respectively 
G09, G12 and G15 from here). The survey depths and ar- 
eas relevant for this study are: 96 deg^ to Tab — 19.4 
(G09 and G15) and - 47 deg^ to tab = 19.8 (G12fl Al l 
regions are more than 98% complete (see [Driver et al I 201 iL 
for precise completeness details) , with special emphasis on a 
high close pair completeness, which is greater than 95% for 
all galaxie s with up to 5 neigh bours within 40^^ of them (see 
Fig. 19 of lDriver et al.ll2011l rl. Despite this high global red- 
shift completeness, we still apply completeness corrections to 
the FoF algori thm (as described in ^2.ip and u se the masks 
described in Ba ldrv et all tOld ) and [Driver eT al. (2011), to 
account for areas masked out by bright stars, poor imaging, 
satellite trails, etc. The v elocity errors on G AMA redshift s 
are typically ^ 50kms~^ ([Driver et al.ll201ll V slightly larger 
than the nominal SDSS velocity uncertainties of ^ 35 kms~^ 
but significantly better than the typical ^ 80 kra s~^ associ- 
ated with 2dFGRS redshifts ([Colless et al.ll200lh . 

For this study, we use a global GAMA (A;+e)-correction, 
of the form: 



ik + e)iz) 



z=0 



ai{Zrei,Zp){z - ZpY + Qz^ef " ^ref) (8) 



1 See lBaldrv et al.l ([2OIOI ) for additional GAMA-I selections. 
^ 99.8% of all galaxies have 5 or fewer neighbours within 40^^ 
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where Zref is the reference redshift to which all galaxies are 
{k + e)-corrected, Q^_^ is a s i ngle l uminosity evolution pa- 
rameter (as in e.g. iLin et all 1 19991 ). Zp is a reference red- 
shift for the polynomial fit to medi an KCORRECT-v4.2 
k-correction (jBlanton k Roweid [20071 ) of GAMA-I galaxies, 
and ai{zref,Zp) the coefficients of that polynomial fit. The 
present study uses Zrei = 0, Qo = 1.75, Zp = 0.2 and A/" = 4, 
with a = 0.2085, 1.0226, 0.5237, 3.5902, 2.3843, for both data 
and mocks. The precise value for Qo = 1.75 is not essential, 
as our estimate of the luminosity function accounts for any 
residual redshift evolution. 

Once the global (k + e)-correction have been defined, 
it is straightforward to estimate the redshift dependent 
galaxy luminosity function using a non-parametric estima- 
tor hke the Step- Wise Maximum Likehhood (SWML) of 
lEfstathiou et all (|l988l l. We perform this analysis in five 
disjoint redshift bins, which are all correlated through the 
global normalisation constraint. This is set by the cumula- 
tive number counts at tab = 19.8 1050 galaxies/ deg^), 
as estimated directly from the full GAMA survey and com- 
pared to ^ 6250 deg^ of SDSS DR6 survey (to account for 
possible sample variance issues). This LF estimate is used 
both to described the survey selection function (as required 
by Eqs.[TH6|), to adjust the galaxy magnitudes in the GAMA 
mock catalogues (see ^2.3p and is hereafter referred to as 

0GAMA. 



2.3 GAMA mock catalogues 

To appropriately test the quality and understand the intrin- 
sic limitations of a given group finder it is essential to test it 
thoroughly on a series of realistic mock galaxy catalogues, 
for which the true grouping is known. Those tests should in- 
clude all the limitations of the real spectroscopic survey, e.g. 
spectroscopic incompleteness, redshift uncertainties, varying 
magnitude limits, etc. 

In this first paper on GAMA groups, we limit our 
tests of the group finding algorithm to one single type of 
mock galaxy catalogue, c onstructed from the Millennium 
dark matter simulation (Springel et al.' '2 005ll. po p ulated 
with galaxies using the GALFORM Bow er et al] (|2006l ) 
semi-analytic galaxy formation recipe. The galaxy positions 
are interpolated between the Millennium snapshots to best 
mimic the effect of a proper lightcone output, enabling the 
mocks to include the evolution of the underlying dark mat- 
ter structures along the line of sight, key for a survey of 
the depth of GAMA that spans '^4Gyr. Finally, the semi- 
analytic galaxies have their SDSS r-band filter magnitudes 
modified to give a perfect match to the redshift dependent 
GAMA luminosity and selection function (see ^2.2[ Love- 
day et al., in prep). When adjusting the magnitudes, we 
use the global GAMA k+e correction of Eq. (8] The 9 mock 
galaxy catalogues have the exact GAMA survey geometry, 
with each mock extracted from the N-body simulation while 
preserving the true angular separation between the three 
GAMA regions. 

The main limitations of this first generation of GAMA 
mock galaxy catalogue for the present group study are listed 
below: 

1) the luminosity depen dent galaxy clust ering does not 
perfectly match the data (|Kim et al.l 1 200^ ) , in particular 



in redshift space (Norberg et al. in prep). By their nature, 
semi-analytic mock galaxy catalogue are not constrained 
precisely to match in any great detail the observed clus- 
tering signal (as opposed to halo occupation distributions 
(HO D) or conditional luminosity functions (CLE) mock s, 
e.g. ICoorav Sh^l2002l : IVa^ et al. l l2003l : ICoora^l2006l l. 

2) the GAMA survey is so spectroscopically complete to 
the GAMA-I survey limits (above 98% on scales relevant for 
this study) that no attempt of modelling any residual survey 
incompleteness into the mocks have been made. 

3) apparent magnitude uncertainties have a negligible ef- 
fect on the GAMA survey selection and hence are not ac- 
counted for in these mocks. 

4) velocity measurement uncertainties are not incorpo- 
rated into the mocks. 

5) the 9 GAMA mocks are not statistically independent, as 
they are drawn from a single N-body simulation. However, 
we ensure in the construction of the different mocks that no 
single galaxy at the exact same stage of evolution is found in 
more than one mock, i.e. there is no spatial overlap between 
the 9 GAMA lightcone mocks created. 

6) despite the high numerical resolution of the Millennium 
dark matter simulation, the light cones used for this work, 
once the shift in magnitudes have been accounted for, are 



not complete below Mr 



5 logio h ^ -14.05. This limit 



is faint enough to not attempt to address this issue in this 
first generation of GAMA mocks. 

7) the halo definition used in these mocks correspond 
to st a ndard halo d efinit io n of GALFORM (ICole et al.1 
I2OOOI : iBower et al] I2OO6I : iBenson Bowerl l2010l \ i.e. 
DHalo (|Hellv et alj I2OO3I I. as listed in the Millennium 
GAVO databasjfl. DHalo is a collection of SubFind sub- 
haloes (jSpringel et alj l200lh grouped together to make a 
halo. The differences between DHalo and FoFHalcQ are sub- 
tle. A preliminary analysis on a small fraction of the mock 
data show that the log ratio of the DHalo and FoFHalo 
masses are median unbiased, and exhibit a 1-cr scatter of 
0.05 dex. The 10% population that exhibits the largest mass 
mismatch are still median unbiased (i.e. they will not affect 
the median relationship between the FoF masses we measure 
and the intrinsic dark matter mass of the halo) , but can scat- 
ter more than 1 dex away from the median. Because the two 
halo mass definitions are not biased w.r.t. each other, the 
DHalo mass can be used safely in this paper as a halo mass 
definition. 

8) the most luminous galaxy of a halo is nearly always at 
its centre and at rest w.r.t. the dark matter halo. 

These mocks are a subset of the first generation of wide 
and deep mock galaxy catalogues for the Pan-STARRS PSl 
survey. Further details on their construction are given in 
Merson et al. (in prep.). 



^ http : //www . g-vo . org/Millennium 

^ FoFHalos are identified with a linking length of 6 = 0.2 in the 
underlying dark matter simulation 
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3 PARAMETER OPTIMISATION USING 
MOCK CATALOGUES 

The minimisation or maximisation of non-analytic functions 
that depend on multiple parameters is an intense research 
area in statistics and computational science. When the di- 
mensionality of the dataset is low, typically 2-4 dimensions, 
it is straightforward to completely map out the whole pa- 
rameter space on a grid. However, when the number of pa- 
rameters is large (e.g. up to 8 for our FoF algorithm) then 
such a computationally intensive approach is not ideal, es- 
pecially if each set of parameter values requires a series of 
complex calculations. For our data size and problem consid- 
ered, each complete grouping takes 10s of seconds, with a full 
parameter space not necessarily obvious to define. Hence we 
use the Neld er-Mead optimisation technique (i.e. downhill 
simplex, see iNelder k Meadlll965l ) that allows for maxima 
(or minima) to be investigated for non-differentiable func- 
tions. The onus is still on the user to choose an appropriate 
function to maximise. For this work we desire a high group 
detection rate with a low interloper fraction in each group, 
and this is the criteria that defines the cost function to be 
minimised. 



3.1 Group cost function 

One of the defining characteristics of how we decide to deter- 
mine grouping quality is that the statistics measured should 
be two-way (bijective). By this we mean that the group cat- 
alogue made with this algorithm is an accurate representa- 
tion on the mock group catalogue, and vice- versa. This is 
an important distinction since it is possible for the group 
catalogue to perfectly recover every mock group, but for 
these to be the minority of the final catalogue, i.e. most 
of the groups are spurious. This has a serious effect on al- 
most any science goal involving use of the GAMA groups 
since any given group would be more likely to be false than 
real — follow up proposals making use of the groups would 
be highly inefficient, and any science involving the stacking 
of detections of multiple groups (X-ray, HI) would be hard 
to achieve. 

With this two-way nature of defining grouping quality 
in mind, there are two global measures that can be ascer- 
tained: how well are the groups and the galaxies within them 
recovered. To retrieve a group accurately we require the joint 
galaxy population of the FoF groups and mock haloes to in- 
clude more than 50% of their respective group members. 
This is called a bijective match, and it ensures that there 
is no ambiguity when we associate groups together — it is 
impossible for a group to bijectively match more than one 
group. To turn this into a global grouping efficiency statistic 
we define the following quantities: 



^FoF 
-E'mock 



NqfoF 

^FoF^mock 



(9) 
(10) 
(11) 



want to use in our maximisation statistic, and will be 1 if 
all groups are bijectively found, and if no groups are de- 
termined bijectively. 

The second measure of group quality determines how 
significantly matched individual groups are, in effect it de- 
termines the 'purity' of the matching groups. The best two- 
way matching group is the one which has the largest product 
for the relative membership fractions between the FoF and 
mock group. Take for example a FoF group with 5 mem- 
bers where 3 of these galaxies are shared with a mock group 
that has 9 members and the other 2 are shared with a mock 
group that has 3 members. In this case the two possible pu- 
rity products are I X I = ^ = 0.2 and | x | = ^ ^ 0.27, 
so the latter match would be considered the best quality 
match. We note in this example that the FoF group is not 
bijectively matched to any mock group. From the definition 
of a bijective group above, it is clear that the match quality 
for a bijective group must always be larger than | x ^ = 0.25. 
Globally we define the following statistics: 



QfoF 

Qmock 
Qtot 



E!Ir^-PFoFH*iVmFoFH 

NruFoF 

Y^Ng^oc]. p^^^^^j ^ Nmmock\i] 



Q FoF Q mock 



(12) 

(13) 
(14) 



where A/'^bij, NqfoF and Ngmock are the number of bijec- 
tive, FoF and mock groups respectively. E'tot is the global 
halo finding efficiency measurement (or purity product) we 



where NmFoF[i] and Nmmock[i] are the number of group 
members in the i*^ FoF and mock group respectively. PfoF [i] 
and PmockH are the purity products of the i^^ best match- 
ing FoF and mock group respectively. In the example above 
PfoF 0.27 and A^mpoF = 5. If a halo is perfectly recovered 
between the FoF and mock then PfoF and Pmock both equal 
1 for that matching halo. Qtot is the global grouping purity 
we want to use in our maximisation statistic, and will be 1 
if all groups are found perfectly in the FoF catalogue. The 
lower limit must be more than (since it is always possible 
to break a catalogue with A^gai galaxies into a catalogue of 
A^gai groups), and at worst Qtot = ^^S'mock/^gai- 

Using ^tot and Qtot we can now calculate our final sum- 
mary statistic: 

5'tot — -E-totQtot, (15) 
where 5'tot will span the range 0-1. 

3.2 Optimisation 

Whilst it is possible to optimise the set of grouping param- 
eters such that the absolute maximum value for 5'tot is ob- 
tained, in practice some of the parameters barely affect the 
returned group catalogue as long as sensible values are cho- 
sen. For FoF group finding. A, ta, I a have a weak affect on 
the final grouping, and fixing them at 9, 1.5 h~^Mpc and 12 
proved to be almost as effective as allowing them to be freely 
optimised. For expediency they were fixed after this initial 
determination. The other 5 FoF parameters do require opti- 
misation, the descending order of parameter importance is: 
bo, Ro, Eh, Er and 

As well as choosing the set of parameters to adjust, the 
set of groups chosen as the basis of optimisation must be 
considered carefully. The optimisation strategy has to be 



GAMA: The GAMA Galaxy Group Gatalogue (G^Gvl) 7 



defined depending on tiie desired goals. Most further stud- 
ies will make use of the largest and best fidelity groups, and 
these groups suffer disproportionately if the optimisation is 
carried out using smaller systems and then applied to all of 
the mock data. Because of this only groups with 5 or more 
members were used to determine the appropriate combina- 
tion of parameters. Part of the justification for this is that 
5 or more members are required to make a meaningful es- 
timate of the dynamical velocity dispersion (ctfof) and 50*^ 
percentile radius (Radso-group). 

To optimise the overall grouping to m aximise the out- 
put o f Stot we used a standard Nelder-Mead (jNelder MeadI 
|l965|) approach, using the optim function available in the 
R programming environment. We simultaneously attempted 
to find the optimal combination of the 5 specified parame- 
ters for all 9 mock GAMA volumes, a process that took ^ 
2 days CPU time. The optimisation was done for 3 different 
magnitude limits: tab ^ 19.0 mags, tab ^ 19.4 mags and 
Tab ^19.8 mags. The returned parameters were extremely 
similar. The set generated for tab ^ 19.4 were the best com- 
promise, producing the highest overall cost for all 3 depths 
combines. Since the solutions were so similar, we took the 
parameters found for tab ^ 19.4 as the single set to be used 
for all analysis. Table [1] contains the optimal numbers for 
the 5 parameters investigated. 

The most significant fact to highlight in Table [1] is that 
Eh and are so close to zero that their effect is completely 
negligible. Interestingly, if we instead attempt the same op- 
timisation problem but remove ly these parameters become 
more significant, but the final cost for the optimisation is 
not as good. This means the 3 parameters adapt in a degen- 
erate manner, but the luminosity based adaptation is the 
most successful, and the parameter most fundamentally re- 
lated to optimal galaxy groups. The GAMA galaxy group 
catalogue will still use all 5 parameters as specified, but we 
note that in future extensions to this work and may 
be removed. 

It is clear that the chosen set of parameters produce 
very similar final 5'tot for all depths 0.4). This implies 
that on average ^foF, ^mock, QfoF and Qmock are all 0.8. 
Even though no restriction is made in terms of which group- 
ing direction has most significance, the breakdown of each 
global grouping component indicates that the cost is most 
easily increased by improving the overall halo finding effi- 
ciency, where for A^foF ^ 5 (a useful selection since largely 
groups are typically harder to group accurately), ^tot =0.69 
and Qtot = 0.53. The contribution to the overall cost is also 
slightly asymmetric from the mock and FoF components: 
^mock = 0.89, ^FoF = 0.77, Qmock = 0.73 and QfoF = 0.80. 
Overall, the cost of mock groups to 5'tot is 0.65, and from 
the FoF groups it is 0.62. These numbers indicate that the 
FoF algorithm must recover, on average, more groups than 
actually exist in the mock data. Also, the FoF algorithm is 
slightly better at constructing the groups it finds than it is at 
recovering haloes from the data. These statistics mean that 
the most successful algorithm is necessarily a conservative 
one where real haloes are robustly and unambiguously de- 
tected, and interloper rates kept low in these systems. This 
is required since it is very easy to create spurious group de- 
tections once the grouping is more generous. 



3.2.1 Parameter sensitivity 

To assess how sensitive the best parameters found are to per- 
turbations in the volume investigated (sample variance) we 
made optimisations for each of the 9 GAMA mock volumes. 
The distribution of the parameters gives us an indication 
of both how well constrained they are, and how degenerate 
they are with respect to the other parameters. 

A PGA analysis of the outcome for 5 parameters opti- 
mised to 9 volumes suggests nearly all the parameter vari- 
ance is explained with just two principle components. The 
most significant parameters are b and zy, and these are 
anti-correlated. R is the only other significant parameter 
that contributes to component 1, and this is anti-correlated 
closely with b. Eh and ^r dominate the second component, 
and they are strongly anti-correlated. 

Table [2] shows the 1-a spread in optimal parameter val- 
ues obtained, and gives an indication of how stable our pa- 
rameters are to the sample selection. The only surprising 
fact is that ^r is prone to vary quite a large amount de- 
pending on the volume, however this is precisely because it 
has least influence on the quality of any grouping outcome, 
and hence a large change can cause minor improvements in 
the grouping, b is extremely well constrained, which is im- 
portant to know since it is comfortably the most significant 
parameter for any FoF grouping algorithm. 

4 GROUP PROPERTIES, RELIABILITY AND 
QUALITY OF GROUPING ALGORITHM 

Whilst the primary aim of the grouping algorithm is to max- 
imise the accuracy of the content of the groups, it is essential 
to derive well determined global group properties. The group 
velocity dispersion (ctfof) and radius (rFoF), are key proper- 
ties to recover accurately, as they form the most directly in- 
ferred group characteristics, together with the group centre 
and total group luminosity (Lfof). The importance of their 
precise recovery is further strengthened by the expectation 
that a reasonable dynamical mass estimator is proportional 
to cr|oF and tfoF f ^4.3p . 

There are many ways to estimate ctfoF and tfoF, but 
it is essential for the estimates to be median un-biased and 
robust to slight perturbations in group membership. Both 
constraints are important so as to not make our group prop- 
erties overly sensitive to some precise aspect of the grouping 
algorithm (a process that will never produce a perfect cata- 
logue) . 

Hereafter we adopt the following notation. XfoF and 
-^haio correspond to a quantity X measured using galaxies 
of the Friends-of-Friends mock group and of the underly- 
ing/true Dark Matter haloes respectively. The estimate of 
X is done with the same method both times, i.e. only the 
galaxy membership changes between the two measurements 
for matched FoF and halo groups. Matching in the mocks 
corresponds to the best group matching between FoF groups 
and intrinsic haloes, defined as the two way match that pro- 
duces the highest Qtot (see ^3.1l for further details). We refer 
to group multiplicity, A^foF, as the number of group mem- 
bers a given FoF group has, which has to be distinguished 
from A^haio the true number of group members fo a given 
halo. Xmock is a value based on an output of the semi- 
analytic mock groups directly, it is not measured using a 
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bo Ro Eji V Stot{rAB ^ 19.0) StotirAB ^ 19.4) StotirAB ^ 19.8) 

0.06 18 -0.00 -0.02 0.63 0.40 0.42 0.41 

Table 1. The optimal global parameters for all groups with A^foF ^ 5. 



(^bo (7Rq o-e^ cte^ (Tu crty^/bo CTRq/Rq CTi^/u 
0.00 1 0.02 0.10 0.06 0.03 0.04 0.09 



Table 2. The 1-a spread of the optimal grouping parameters found for the 9 different mock GAM A lightcones. For the three most 
important parameters, their relative spread is indicated as well. 



similar method as for the FoF groups. In practice, only the 
total luminosity of the galaxies in the mock group (Lmock) 
require this notation since they are found from summing up 
the flux of all group members beyond the magnitude limit 
of the simulated light cone. Finally, Xdm refers to a prop- 
erty that is measured from the Millennium Simulation dark 
matter haloes themselves (so not dependent on the semi- 
analytics in any manner). In practice, only the total mass 
of all dark matter particles within the halo (Mdm) require 
this notation. 

4.1 Velocity dispersion estimator 

The group velocity dispersion, ctfoF, is measured wit h 
the gapper estimator introduced by IBeers et alj (|199QI V 
and used f or velocity dispersion estimates in e.g. 2PIGG 
(I Eke et al ]r2004). This estimator is unbiased, even for low 
multiplicity systems, and is robust to weak perturbations in 
group membership. 

In summary, for a group of multiplicity N = A^foF, all 
recession velocities are ordered within the group and gaps 
between each velocity pair is calculated using gi = Vi-^i — Vi 
for i = 1, 2..., — 1, as well as weights defined by Wi = 
i{N — i). The velocity dispersion is then estimated via: 

^g-p^ iv(iv-i) E^-^- (1^) 

i—l 

Based on the fact that in the majority of mock haloes the 
brightest galaxy is moving with the halo centre of mass, 
th e velocity d ispersion is increase d by an extra factor of 
^/N/{N^^ (as implemented in Eke et al.l[20o3 ). Eq. [16] 
assumes no uncertainty on the recession velocities, while in 
reality the accuracy of the redshifts (and therefore recession 
velocities) depend among other things on the galaxy survey 
considered. To account for this the velocity dispersion is 
further modified by the total measurement error cr err being 
removed in quadrature, giving: 



= Y ^y^TyCriap - Crirr • (17) 

The total measurement error (Jerr is the result of adding to- 
gether the expected velocity error for each individual galaxy 
in quadrature, where we account for the survey origin of 
the redshift, the leading source of uncertainty in estimating 
(Jerr. The GAMA redshift catalogue is mainly composed of 
redshifts from GAMA (- 84%), SDSS (- 12%) and 2dF- 
GRS (^ 3%) where the typical er rors are ^ 50kni s~^, 
- 30kms"^ and - 80kms"^ fsee IPriver et al.ll201lL for 



further details on the redshift uncertainties in the GAMA 
catalogue) . 

Fig.[2]shows the distribution of the log-ratio of the mea- 
sured/recovered velocity dispersion (ctfof) to the intrinsic 
galaxy velocity dispersion (ahaio), for best matching FoF/ 
halo mock groups. Explicitly (Jhaio is estimated using Eg .[161 
with mock GAMA galaxies belonging to the same under- 
lying halo, i.e. ahaio does not correspond to the underlying 
dark matter halo velocity dispersion. Furthermore ahaio is 
estimated using only the line-of-sight velocity information. 
Hence a perfect grouping would result in foirac distributions 
in Fig. [21 The fact that these distributions are so tight is a 
reflection of the quality of the FoF grouping. For ^ 80.4% 
(^ 50%) of all mock groups, the recovered (JfoF is within 
^ 50% (^ 14%) of the intrinsic value. The distributions are 
median unbiased for most multiplicities with the mode close 
to zero as well. The symmetry of Fig. [2] is a good indica- 
tion that the FoF groups are as likely to underestimate as 
overestimate the velocity dispersion. 



4.2 Group centre and projected radius: 
definitions and estimators 

More contentious quantities to define and estimate are the 
centre and the projected radius of a group. Firstly there is 
no unique way to define the group centre (e.g. centre of 
mass (CoM), geometric centre (GC), brightest group/cluster 
galaxy (BCG),...) from which the projected radius is defined. 
Secondly the projected radius definition will depend on what 
fraction of galaxies should be enclosed within it and on what 
assumption is made for the distance to the group. 

To determine the most robust and appropriate defini- 
tions for the centre and projected radius of a group a number 
of schemes were investigated. Hereafter we implicitly assume 
projected radius when referring to the group radius. 

4.2.1 Projected group centre 

For the group centre three approaches were considered. 
Firstly, the group centre was defined as the centre of light 
(CoL) derived from the tab -band luminosity of all the galax- 
ies associated with the group, which is an easily observ- 
able proxy for the CoM. Secondly, an iterative procedure 
was used where at each step the rAB-band CoL was found 
and the most distant galaxy rejected. When only two galax- 
ies remain, the brighter rAB-band galaxy is used as the 
group centre. We refer to it as Iter. Thirdly, the brightest 
group/cluster member (BCG) was assumed to be the group 
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Figure 2. Probability distribution function (PDF) of log^Q crFoF/<^halo5 i-e. the log-ratio of the measured/recovered velocity dispersion 
(<7Fof) to the intrinsic galaxy velocity dispersion (crhalo)? for best matching FoF/ halo mock groups. Each panel shows groups of different 
multiplicities, as labelled. The vertical dashed lines indicate where crpoF is a factor 2/5/10 off the intrinsic CThaio- The more peaked and 
centred on the PDF is, the more accurately the underlying (Jhalo is recovered. 
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Figure 3. Distribution of position offsets between different group centre definitions and the underlying/true group centre for bijec- 
tively matched mock groups. Each panel shows groups of different multiplicities, as labelled. Solid/dashed/dotted lines indicate the 
Iter/CoL/BCG centre definitions (see text). The nearly vertical lines at small radii correspond to groups which have a perfectly recov- 
ered centre position (i.e. zero offset). Their fraction is indicated in the panel as "Perfect". 



centre. For mock groups with A^foF ^ 5, 95% of the time the 
iterative procedure produces the same group centre as the 
BCG definition. 

Fig.Opresents a comparison between three group centre 
definitions (Iter, CoL, BCG) and the true/underlying group 
centre for the best matching (highest Qtot) mock groups. 
In this context "true" refers to the centre we obtain when 
running the same algorithm on the exact mock group. The 
plot shows the distribution of the positional offsets for the 
different definitions of group centre when compared to the 
"truth" for different group multiplicities, with the fraction 
that agrees perfectly stated in each panel for each group 
centre definition. 

The iterative method always produces the best agree- 
ment for the exact group centre and seems to be slightly 
more robust to the effects of group outliers. As should be 
expected, the flux weighted CoL definition is the least good 
at recovering the underlying/true halo centre position. With 
the CoL definition, the group needs to be recovered exactly 



to get a perfect match and any small perturbations in mem- 
bership influences the accuracy with which the centre is re- 
covered. This is very different to the BCG or Iter centre 
definitions, which are only very mildly influenced by pertur- 
bations in membership. 

The iterative centre is therefore preferable over merely 
using the BCG: it has a larger precise matching fraction 
and a smaller fraction of groups with spuriously large centre 
offsets. It is very stable as a function of multiplicity, with 
a fraction of precise group centre matches of ^ 90%, as 
indicated in the panels of Fig. O Hereafter we refer to the 
Iter centre definition as the group centre. 

4.2.2 Radial group centre 

The group centre definitions as considered in ^4.2.11 do not 
necessarily define what the actual group redshift should be. 
One possible solution is to identify it with the redshift of 
the central galaxy, as found with the Iter centre definition. 
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An alternative solution would be to select the group red- 
shift as the median redshift of the group members. Fig. [H 
presents the distribution of the difference between the re- 
covered median redshift and the intrinsic median redshift 
for best matching FoF/ halo mock groups. The fraction 
of group redshifts that agree precisely is stable as a func- 
tion of multiplicity at ^ 55%, and the offset is usually less 
than lOOkms"^. 80% of the time the redshift differences 
are within the G AMA velocity error of ~ 50kms~^ (see 
[Driver et al.ll201ll , for details). It is essential to notice that 
this radial centre is defined in redshift space (i.e. including 
peculiar velocities) as opposed to real space (i.e. based on 
Hubble flow redshift), as only information for the former is 
available from a redshift survey. A comparison between the 
real and the redshift space centre shows directly the impor- 
tance and the impact of bulk flow motions, i.e. the galaxy 
groups themselves are not at rest. 

4-2.3 Projected group radius 

The radius definition must be a compromise between con- 
taining a large enough number of galaxies to be stable sta- 
tistically and small enough to not be overly biased by or sen- 
sitive to outliers and interlopers (which tend to lie at larger 
projected distances). Three radius definition were consid- 
ered: Radso, Radi_cr and Radioo containing 50%, 68% and 
100% of the galaxies in the group respectively. The latter, 
Radioo, is mainly used for illustrative purposes, as it is ex- 
tremely sensitive to outliers. Radx is defined using the de- 
fault quantile definition in R, i.e. the group members are 
sorted in ascending radius value, assigned a specific per- 
centile (the most central 0% and the furthest away 100%) 
and finally a linear interpolation between the radii of the two 
relevant percentiles is performed. This implies that only the 
radial distance of the two galaxies bracketing the percentile 
definition used are considered in the estimate of Radx, ex- 
plaining why Radioo is expected to be the most sensitive to 
outliers. 

Fig. [5] shows a comparison between three radii def- 
initions as measured from the iterative centre for recov- 
ered mock groups (Radx-Fop) and for true mock haloes 
(Radx-haio) for best matching FoF/ halo mock groups. 
Radso is marginally more centrally concentrated than Radi^ 
for all multiplicity subsets and is hence the least affected by 
interlopers and outliers. 

The subsets plotted in Fig. [5] up to 10 ^ A^foF ^ 19 
are all median unbiased, although there is a slight high- 
moment excess of large radius groups for 2 ^ A^foF ^ 9 and 
a high moment excess of erroneously low radius groups for 
10 ^ A^FoF ^ 19. This does not affect the median of the 
distribution, but requires the mean to be offset from the 
median in these cases. 

The highest multiplicity subset (right most panel of 
Fig. [5]) has an identifiable excess of low radius groups, lead- 
ing to a biased median that is ^ 15% lower than the original 
aim. Hence the estimated Radso -foF for half of the high- 
est multiplicity groups is underestimated by at least ^ 15% 
compared to the corresponding underlying Radso-haio- We 
note however that this definition still behaves better than 
any of the other two considered. 

Whilst the accuracy of the measured velocity dispersion 
noticeably improves as a function of multiplicity (see Fig.[2|), 



the accuracy of the observed radius does not. This obser- 
vation should be expected since groups have their centres 
iterated towards the optimal solution. During this process 
they, in effect, become lower multiplicity as the outliers are 
removed, and thus will suffer from similar numerical arte- 
facts. 

Based on the improvement in radius agreement for 
A^FoF ^ 5, Radso was selected as the preferred definition 
of radius for use in the GAMA galaxy group catalogue. For 
the remainder of this paper, and in any future discussion of 
GAMA galaxy groups, any mention of group radius implic- 
itly refers to Radso- However it is to be noted that Radi-^ 
is better behaved for low multiplicity groups (A^foF ^4), as 
the "bumps" at ±0.3 in the left most panel of Fig. [5] have 
vanished nearly completely in that case. The origin of these 
two spikes becomes clear in the discussion of Fig. \6\ 

4.3 Dynamical group mass estimator and 
calibration 

Once an unbiased and robust group velocity dispersion and 
a nearly unbiased group radius can be estimated, the final 
step is to combine this information into a dynamical mass 
estimator. To first order for a virialized system we expect 
its dynamical mass to scale as M oc cr^i?, where a and R 
are calculated as described in ^4.11 and ^4.21 

To understand any correlated biases in the estimates of 
these two fundamental group properties, we plot in Fig. [6] 
the group density distribution as a function of the rela- 
tive accuracy of the recovered group radius (x-axis) and 
the square of the group velocity dispersion (y-axis). More 
precisely Fig. \6\ shows the group density distribution in the 
logio Radx-FoF/Radx-haio - logio(c^FoF/crhaio)^ plane, split 
as function of redshift and multiplicity, with ranges speci- 
fied in each panel. The green dashed lines delineate regions 
where cr|op Radso -FoF is 2/5/10 times off the expectation 
given by cThaioR^adso-haio, reflecting to some extent the im- 
plied uncertainty on any dynamical mass estimate. As a 
matter of fact, if the dynamical mass is proportional to a^R 
as expected for a virialized system and can be directly es- 
timated from (Jhaiol^adso-haio, then the green dashed lines 
indicate by what amount the halo mass as inferred from 
cr|oF Radso -FoF deviates from the true one (assuming the 
same proportionality factor). Additionally any asymmetry 
in the density distribution w.r.t. those guide lines is a sign 
of a bias in the inferred mass: a density excess in the top- 
right/bottom-left of any panel indicates a bias towards in- 
correctly high/low dynamical masses. Note that a density 
excess orthogonal to these lines is not problematic for the 
mass estimates since the individual biases cancel out in this 
parametrisation. 

As a function of redshift the density distributions in 
Fig. [6] are well behaved. As a function of multiplicity the 
main effect is a tightening of the distribution, which is ex- 
pected since the velocity dispersion and, to a lesser degree, 
the radius can be better estimated with more galaxies. The 
5 ^ A^FoF ^ 9 multiplicity range shows some small bias to- 
wards high dynamical masses (the 90% contour wing) whilst 
the highest multiplicity subset (20 ^ A^foF ^ 1000) appears 
to be biased to slightly low dynamical masses (offset for 
10% and 50% contour wings). Overall the biases are small 
for A^FoF ^ 5 multiplicity groups, and in the tails of the dis- 



GAMA: The GAM A Galaxy Group Catalogue (G^Gvl) 11 




Figure 4. Probability distribution function (PDF) of 2;foF — -^halo for best matching FoF/ halo mock groups, where z is the median 
redshift of the group. Each panel shows groups of different multiplicities, as labelled. The fraction of exact matches is indicated in each 
panel, as "Perfect". 




Figure 5. Probability distribution function (PDF) of log^Q Radx-FoF/Radx-halo? i-^. the log-ratio of the measured/recovered radius 
(Radx-Fop) to the intrinsic galaxy radius (Radx-halo)^ for best matching FoF/ halo mock groups. Each panel shows groups of different 
multiplicities, as labelled. Solid/dashed/dotted lines indicate the Radso, Radi_cr and Radioo radii definitions respectively encompassing 
50%, 68% and 100% of the galaxies in the group. The solid line, Radso, produces the tightest distribution of the three considered. The 
vertical dashed lines indicate where Radx-FoF is a factor 2/5/10 off the intrinsic Radx-halo- 



tributions rather than in the median or the mode. However, 
for low multiplicity groups (A^foF ^ 4) the situation is rather 
different. First of all, there is an extensive scatter in the re- 
covered velocity dispersion at log^Q Radx-FoF/Radx-haio — 
±0.3. This is entirely related to the "bumps" seen in Fig. [5] 
and are due to mismatches in the grouping, explaining why 
the velocity dispersions are so poorly recovered for some of 
those systems. The reason for an over density of groups at 
±0.3 (i.e. half/double the underlying radius) is related to the 
way Radso works. When a A^foF = 2 group misses one mem- 
ber and when a A^foF = 3 group contains one interloper this 
results most often in a FoF group where the calculated group 
centre is the sam43 but radius that is half and double the 
halo radius respectively. Additionally any asymmetry seen 
in the top panels of Fig. [6] can be attributed to low mul- 



tiplicity groups. Generally Fig. [6] gives us confidence that 
measurement errors in and R are not highly correlated. 
The dynamical mass of a system is estimated using 

Mfof _ A / Q-FoF \^ RadpoF .-.^x 

/i-iMo ~ . % Vkms-iy h-^Mpc^ ^ 

^ M0-lkm2 s-2Mpc ^ 

where G is the gravitational constant in suitable units, i.e. 
G = 4.301 X lO"^M0"^km2s"^Mpc. A is the scaling fac- 
tor required to create a median unbiased mass estimate of 
Mdm/MfoF- For a 'typical' cluster with a 1 Mpc radius 
and a velocity dispersion of 1000 km s~^, the mass given by 
Eq. [18]is - 2 A X 10^^ M©. A is likely to be larger than 
unity, since the estimated velocity dispersion using Eq. [16] 
traces the velocity dispersion along the line-of-sight onljlj 
and the average projected radius is smaller than the average 



^ Because the group centre is so accurately recovered, see Fig. [3] 



For isotropic systems ctsd ~ V^cri^ 
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Figure 6. 2-D density distribution of the best matching FoF/ halo mock groups in the log^Q Radx-Fop/Radx-halo ~ logio (<^FoF/<^halo)'^ 
plane, split as a function of redshift and multiplicity (top and bottom panel respectively). The x and y-axes show the relative accuracy of 
the recovered radius and velocity dispersion (squared) respectively. The contours represent the regions containing 10/50/90% of the data 
for three magnitude limits, i.e. tab ^ 19.0 (black), tab ^ 19.4 (red) and tab ^ 19.8 (blue). The green dashed lines delineate regions 
where (jp^pi^adso— FoF is 2/5/10 times off the expectation given by cr^g^^Qi^ac^so— halo? reflecting to some extent the implied uncertainty 
on any dynamical mass estimate (see text for details). 



intrinsic radiuj^- Finally, Eq.[T8]can only be truly valid for a 
system in virial equilibrium, which many of our system will 
not necessarily be. Hence the best approach is to determine 
A in a semi-empirical manner by requiring it to produce a 
median unbiased halo mass estimate when comparing best 
matching FoF/ halo mock groups. 

Performing a single global optimisation using all bijec- 
tively matched groups with A^foF ^ 5 results in A — 10.0. 
This is somewh at different to this A = 5 factor found in 
lEke et alJ (|2004l ). This should not be surprising since there 
are differences in the style of grouping optimisation, and we 
have used a more compact definition of the group radius and 
a different approach to recovering the group centre. It is in- 
teresting to note that this scaling of A — 10.0 is identical to 
the dy namical mass scaling found in Chilingarian &: MamonI 
((20081) for calculating the virial mass of dwarf galaxies. 

Fig. El compares the median globally calibrated dynam- 
ical masses to the underlying DM halo mass for best match- 
ing FoF/ halo mock groups (using A = 10.0). Whilst the dis- 



^ For isotropic systems the relation depends on the exact ra- 
dius definition. Conceptually the 3D and 2D radius will agree for 
RadiOO but increasingly disagree as the radius measured becomes 
smaller due to the relative concentration of objects towards the 
centre when observing a projected 2D, as opposed to intrinsic 3D, 
distribution. 



tribution is globally unbiased for A^foF ^ 5 (by definition), 
small deviations as a function of redshift and/or multiplicity 
are evident. Offsets from the median line are evident at all 
multiplicities, but strongest for low multiplicity systems (i.e. 
2 ^ A^FoF ^ 4 groups in Fig. [7|). The small biases becomes 
more apparent at higher redshifts, driven by the average ob- 
served group multiplicity dropping as a function of redshifts 
and the average mass increasing. To gauge how sensitive the 
scaling factor A is to the specific subset of data considered 
combined cuts in redshift and multiplicity were made. Ta- 
ble [3] contains the various A factors required for the different 
subsets as a function of the possible limiting magnitudes for 
the GAM A group catalogue. 

Using the data in Table [3] the best fitting plane that 
accounts for the variation of A as a function of \/NfoF and 
^/zFoF is calculated. To prevent strong biases to low A^foF 
systems purely by virtue of their overwhelming numbers, the 
plane was not weighted by frequency and should produce 
the appropriate corrections throughout the parameter space 
investigated. The plane function for A is given by 



A{NfoF, ZFof) = Ac + 



A. 



VNf. 



(19) 



where Ac, An and Az are constants to be fitted. Tabled 
contains the parameters that produce the best fitting planes 
for the three different GAMA magnitude limits. The mo- 
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Figure 7. 2-D density distribution of best matching FoF/ halo mock groups in the MfoF-Mdm plane, split as a function of redshift and 
multiplicity (top and bottom panel respectively). These panels objectively compare the recovered group masses to the underlying DM 
halo masses. The contours represent the regions containing 10/50/90% of the data for three magnitude limits, i.e. tab ^ 19-0 (black), 
TAB ^ 19-4 (red) and tab ^ 19.8 (blue). The dots indicate the exact MfoF~^dm pairs. For MpoF we use Eg. 1181 and A = 10.0. The 
green dashed lines delineate regions where MpoF is 2/5/10 times off the underlying Mdm- 



2 ^ A^FoF ^4 5 ^ A^FoF ^9 10 ^ A^FoF ^ 19 20 ^ A^foF ^ 1000 





19.0 


19.4 


19.8 


19.0 


19.4 


19.8 


19.0 


19.4 


19.8 


19.0 


19.4 


19.8 


^ ^FoF ^ 0.1 


20.0 


19.0 


18.0 


11.8 


10.8 


10.9 


11.4 


12.0 


11.5 


12.1 


12.6 


12.7 


0.1 ^ ZFoF ^ 0.2 


20.2 


19.5 


19.2 


10.3 


10.5 


10.7 


11.0 


11.1 


10.9 


9.2 


10.4 


10.9 


0.2 ^ ZFoF ^ 0.3 


21.2 


21.5 


19.8 


9.0 


10.3 


11.2 


8.0 


8.6 


9.9 


6.7 


8.3 


9.6 


0.3 ^ ZFoF ^ 0.5 


13.6 


17.4 


17.8 


4.4 


6.1 


7.9 


3.5 


5.4 


6.7 


4.8 


5.6 


6.9 



Table 3. Values of A, the dynamical mass scaling factor of Eq. 1181 required to create an unbiased median mass estimate for different 
disjoint subsets of bijectively matched groups. 



tivation for the functional form is mainly driven to ensure 
positivity of A{NfoF, zfof) over the range of GAMA multi- 
plicities and redshifts, and a good fit to the data within these 
limits. The errors shown in Tableware estimated from find- 
ing the best fitting plane for the 9 mock GAMA volumes 
separately and measuring the standard deviation of the in- 
dividual best fitting planes, much like the approach used for 
Table El 



4-3.1 Mass estimate scatter 

It is important to highlight that even though the observed 
dynamical mass estimates and halo masses are well corre- 
lated (in particular the scatter is approximately mirrored 
across the 1-1 line in Fig. [71), it is impossible to select an 
unbiased subset of mass unless the selection is across the 



Ac 

TAB ^ 19.0 -4.3 ± 3.1 22.5 ± 1.7 3.1 ± 1.1 

TAB ^ 19.4 -1.2 ± 1.7 20.7 ± 1.4 2.3 ± 0.6 

TAB ^ 19.8 +2.0 ± 1.4 17.9 ± 1.1 1.5 ± 0.4 

Table 4. Table of parameters that create the best fitting plane 
to the data in Table O The plane is a function of group redshift 
and multiplicity, as given in Eq. 1191 Errors are estimated from 
running plane fits to the 9 mock GAMA volumes separately and 
measuring the standard deviation of the individual best fitting 
planes. 



mode of the distribution. This is due to Eddington bias 
rather than any intrinsic issue with the mass estimates — 
since most haloes in GAMA will have moderate masses 
10^^ Mq) if simple Gaussian scatter in mass esti- 
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mates is assumed, then a high mass subset must contain a 
larger fraction of lower mass haloes scattered up in mass, and 
a low mass subset must contain a larger fraction of higher 
mass haloes scattered down in mass, hence the medians are 
biased. This effect is different to a Malmquist-bias, which ex- 
plains the observational bias in distribution of halo masses 
as a function of distance. 

This effect can be modelled quite accurately by assum- 
ing we have median unbiased log-normal relative error in 
the mass estimate, where the standard deviation of the dis- 
tribution (Merr) is a fuuctlou of systcm multiplicity. The 
effect multiplicity has on the accuracy of the mass can be 
seen clearly in Fig. [S] where although median unbiased for 
A^FoF ^ 4, the standard deviation of the distribution de- 
creases strongly as a function of multiplicity. The approxi- 
mate function for this effect is given by 



logio( 



Me, 







1.0-0.43 logio(A^FoF) 



(20) 



where the appropriate range of use is 2 ^ A^foF ^ 50, beyond 
which the standard deviation is ^ 0.27. We recast this error 
function back onto the intrinsic mock halo masses to give a 
new mass with simulated dynamical mass errors: 



)) 



(21) 



where G{x, fi) is a random sample from the normal distribu- 
tion with a mean x and standard deviation /i. Fig. [9] shows 
how the intrinsic halo mass compares for the same halo 
masses but with our fiducial error function applied. This 
shows the main contour twisting features described above — 
particular clear is the sampling bias you would expect when 
selecting groups based on the observed halo masses. For in- 
stance, the manner in which the mode of the contours ap- 
pears to be more vertical than the 1-1 line in Fig. [7] (the 
slight rotation of the contours) is well replicated in Fig [9] 
and can be explained by the random scatter of the mea- 
sured dynamical mass from the intrinsic halo mass. 



4.4 Total group luminosity estimator 

The total group luminosity is an equally important global 
group property. It should not be just the total luminosity 
of the observed group members but the total luminosity as 
inferred from an arbitrarily faint absolute magnitude limit 
cut in order to address residual selection effects. To do this 
we calculate the effective absolute magnitude limit of each 
group, measure the rAB-band luminosity contained within 
this limit and then integrate the global GAMA galaxy LF 
(see 3231 to a nominal faint limit used to correct for the 
missing flux. Explicitly, for each group we calculate the fol- 
lowing: 



i^FoF = B Lo 



I- 



W-0-1M^<f>GAUAiMr)dMr 



(22) 



where Lob is the total observed rAB-band luminosity of the 
group, B is the scaling factor required to produce a perfectly 
median unbiased luminosity estimate and Mr-Um is the ef- 
fective TAB-band absolute magnitude limit for the group. 
This limit depends on the redshift of observation and ap- 
parent magnitude limit used. Corrections are only a few 



percent at low redshift when using tab ^ 19.8 and can 
become factors of a few at zfoF 0.5. To convert magni- 
tudes into solar luminosities we take the rAB-band absolute 
magnitude of the Sun to be M^q = 4.670. The limits of 
—30 ^ Mr ^ — 14 used in the numerator of Eq.[22]are effec- 
tively limits of — (X) ^ Mr ^ oo since the luminosity density 
of a typical LF is nearly all recovered within a couple of mag- 
nit udes_ofA£2:Assuming the Schechter function parameters 
of Bl anton et al.l (|2003) we would expect to retrieve 99.5% 
of the intrinsic flux using these limits, assuming the LF con- 
tinues down to inflnitely faint galaxies. More practically, the 
bright limit (Mr ^ —30) is much brighter than any known 
galaxy, and the faint limit (Mr ^ —14) is the limit of the 
GAMA SWML LF used for this work, and thus is also the 
effective limit of the mock catalogues used since the galaxy 
luminosities were adjusted to return the GAMA LF. 

Since the median redshift of GAMA is z ^ 0.2 and 
the apparent magnitude limit is at least tab = 19.4, most 
groups will contain memb ers faint wards of M^ (with M^ — 
M* - 5 logio h = -20.44, iBlanton et £11120031 ). Because the 
luminosity density is dominated by galaxies around M^ , the 
extrapolation required to get a total group luminosity will 
be quite conservative since most groups are sampled well 
beyond M^. 

This process assumes that a global LF is appropriate for 
all groups over a range of masses and environments, which is 
known not to be the case (e. g;. .Eke et al...2004 : Cro ton et al.l 
l2005l : lRobotham et al.ll2006h . However, since the median lu- 
minosity scaling is less than a factor 1.6, the difference that 
adjusting to halo specific LFs would have to the integrated 
light will usually be smaller than the statistical scatter ob- 
served (which is many 10s of percent). 

Performing a single global optimisation using all bijec- 
tively matched groups with A^foF ^ 5 results in = 1.04. 
This number accounts for a number of competing effects: 
the shape of the faint end slope (a) and the characteris- 
tic magnitude (M*) varying between grouped environments 
and the global average, and the effects of interloper flux 
biasing the extrapolated group luminosities. Overall the ef- 
fects are rather small, and globally we see a value close to 
1, which implies neither a large amount of under-grouping 
nor over-grouping. 

Fig. [10] compares the inferred total group luminosity 
(Lfof) to the underlying mock luminosity (Lmock) for best 
matching FoF/mock galaxy groups. The typical scatter as a 
function of mock group luminosity is quite constant regard- 
less of group multiplicity, with only an excessive amount of 
scatter for the lowest multiplicity groups, as evidenced in 
the bottom left panel of Fig. 1101 The relations are mostly 
unbiased, except for the two higher redshift samples (top 
right panels of Fig. [TO)) . 

The scatter in extrapolated group luminosity is much 
smaller than seen for dynamical masses in Fig. [71 This IS 
expected since fewer observables are required in its esti- 
mate and the effect of interlopers is much smaller. By their 
nature, interlopers are more likely to systematically affect 
"geometrical" quantities, like biasing the observed velocity 
dispersion and radius, while having a lesser impact on e.g. 
total luminosities. This is because the nature of the opti- 



8 f 
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NfoF 

Figure 8. Relative difference between measured and underlying group masses as a function of multiplicity for different redshift subsets. 
The improvement in the measurement of the velocity dispersion and the radius tightens the distribution until A^foF ~ 50. The lines 
represent the 3 survey depths of interest: tab ^ 19-0 (black) tab ^ 19-4 (red) and tab ^ 19.8 (blue). For MpoF we use Eg. 1181 and 
A = 10.0. 




Figure 9. As Fig. [71 but for the simulated relation between Mdm and Mgim (^DM with the expected random errors applies using Eq. l21|) . 
by modelling the expected error just as a function of group multiplicity. The contours represent the regions containing 10/50/90% of the 
data for three different magnitude limits, tab ^ 19.0 (black), tab ^ 19.4 (red) and tab ^ 19.8 (blue). Mgim is estimated using Eg. 1201 



mal grouping used for this work means that on average we 
should miss as many true group galaxies as add interlop- 
ers, so the net loss and gain of galaxy luminosities tend to 
balance out. 

As with the dynamical mass estimates, scaling factors, 
listed in Table El are calculated for various redshift and mul- 



tiplicity subsets in order to properly quantify outstanding 
biases that remain after scaling the observed luminosities to 
account for galaxies below the survey flux limit. They are 
distributed around unity, which is what we would expect if 
the extrapolated flux fully accounts for all of the missing 
flux. The variation in the median seen in the table is larger 
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Figure 10. 2-D density distribution of best matching FoF/halo mock groups in the LpoF-Lmock plane, spht as a function of redshift and 
multiphcity (top and bottom panel respectively). These panels objectively compare the recovered group luminosities to the underlying 
total luminosity in the mocks. The contours represent the regions containing 10/50/90% of the data for three magnitude limits, i.e. 
TAB ^ 19-0 (black), tab ^ 19.4 (red) and tab ^ 19.8 (blue). The dots indicate the exact LpoF-Lmock pairs. The green dashed lines 
delineate regions where LpoF is 2/5/10 times off the underlying L^ock- For LpoF we use Eg. 1221 and B = 1.04. 



than seen for the dynamical mass scaling factors. This is be- 
cause we have applied a global LF correction to the data and 
the LF is kno wn to vary strongly as a function of group envi- 
ronment (e.g. lRobotham et al.ll2006l ). Since we are naturally 
more sensitive to higher mass groups at higher redshifts, this 
explains the strong redshift gradient scaling factor required, 
and in comparison the multiplicity variation is very small. 
For the dynamical A factors the dominant variable was the 
group multiplicity. When using the groups this is an impor- 
tant consideration: the group dynamical masses are more 
intrinsically stable (require smaller corrections) as a func- 
tion of redshift, whilst group luminosities are more stable as 
a function of multiplicity. 

As with the dynamical masses, the total group luminos- 
ity correction factors {B) can be well described by a plane 
that fits Table [5] viz 

B{NfoF, zfof) = Be + + , (23) 

VNfoF V^FoF 

where B^ and Bz are constants to be fitted. Table [6] 
contains the best parameters that produce the best fitting 
planes for the three different GAMA magnitude limits. The 
errors shown in Table [6] are estimated from finding the best 
fitting plane for the 9 mock GAMA volumes separately and 
measuring the standard deviation of the individual best fit- 
ting planes, much like the approach used for Table O 



Be B]^ Bz 

TAB ^ 19.0 +1.27 ± 0.38 -0.67 ± 0.25 0.08 ± 0.10 
TAB ^ 19.4 +0.94 ± 0.12 -0.67 ± 0.11 0.16 ± 0.04 
TAB ^ 19.8 +0.65 ± 0.06 -0.50 ± 0.06 0.22 ± 0.02 

Table 6. Table of parameters that create the best fitting plane 
to the data in Table O The plane is a function of group redshift 
and multiplicity, as given in Eq. 1221 Errors are estimated from 
running plane fits to the 9 mock GAMA volumes separately and 
measuring the standard deviation of the individual best fitting 
planes. 



4.5 Group mass and light 

The M/L observed in groups is a fundamental property of 
interest in the analysis of galaxy groups. It is obviously im- 
portant that any intrinsic scatter in the estimates of both 
mass and luminosity of groups is not strongly correlated. 

Fig. [11] shows the observed fidelity of the group dy- 
namical masses compared to the total group luminosities 
for a variety of data subsets. Encouragingly the dynamical 
mass and luminosity estimates do not correlate strongly in 
any direction — the most significant concern would be strong 
scatter along the —45° direction since this would mean that 
the dynamical mass estimates tend to be erroneously small 
when the luminosity estimates tend to be erroneously large 
(creating a very small M/L ratio) and vice- versa. Instead 



GAMA: The GAMA Galaxy Group Gatalogue (G^Gvl) 17 



2 ^ A^FoF ^ 4 



5 ^ A^FoF ^ 9 



10 ^ A^FoF ^ 19 



20 ^ A^FoF ^ 1000 





19.0 


19.4 


19.8 


19.0 


19.4 


19.8 


19.0 


19.4 


19.8 


19.0 


19.4 


19.8 


^ ^FoF ^ 0.1 


1.1 


1.1 


1.1 


1.1 


1.1 


1.1 


1.4 


1.3 


1.2 


1.8 


1.7 


1.6 


0.1 ^ zfoF ^ 0.2 


1.0 


1.0 


1.0 


1.1 


1.0 


1.0 


1.2 


1.1 


1.1 


1.3 


1.2 


1.2 


0.2 ^ ZFoF ^ 0.3 


1.0 


0.9 


0.9 


1.1 


1.0 


0.9 


1.2 


1.0 


1.0 


1.2 


1.1 


1.0 


0.3 ^ ZFoF ^ 0.5 


1.1 


0.8 


0.7 


1.2 


0.9 


0.7 


1.5 


1.0 


0.8 


1.1 


1.2 


0.9 



Table 5. Values of the luminosity scaling factor of Eg. 1221 required to create an unbiased median halo luminosity estimate for different 
disjoint subsets of bijectively matched groups. 
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Figure 11. Comparison of the fidelity of the recovered group mass (x-axis) against the group luminosity (y-axis), split as a function of 
redshift and multiplicity (top and bottom panel respectively). For both axes only a global median correction optimized for Np^p ^ 5 
groups is applied, i.e. we use Eq. [TSl and Eq. [22] with A = 10.0 and B = 1.04 for the mass and luminosity estimates respectively. The 
vertical (horizontal) green dashed lines present accuracy factors of 2/5/10 for mass (luminosity) estimates. The contours represent the 
regions containing 10/50/90% of the data for three different magnitude limits, tab ^ 19.0 (black), tab ^ 19.4 (red) and tab ^ 19.8 
(blue). 



the two group measurements show no strong correlations in 
the accuracy of their recovery. 

To demonstrate the improvement witnessed when using 
the multiplicity and redshift scaling relations, Fig. [12] com- 
pares side by side the scatter expected for a simple median 
correction for A^foF ^ 5 (left panel) and for a redshift and 
multiplicity dependent correction (right panel) . The dynam- 
ical mass and luminosity scaling corrections use Eq. [19] and 
Eq.[23]with parameters listed in Tables [4] and [6] respectively. 
The scatter in the recovered luminosity is significantly re- 
duced in the right panel. 

It is clear that using the full multi-parameter scaling 
relations offers an improved distribution of mass and lumi- 
nosity scatter, as well as creating extremely unbiased me- 
dians for the distributions. The three apparent magnitude 
limits used are brought into closer alignment after apply- 



ing the correction, and the amount of scatter is reduced. 
The most significant change is for the 90% contour for high 
ivFoF /ivmock , where we see the contours tighten into very 
close agreement once the correction is made. This means 
that groups extracted from regions of different depths (e.g. 
G09 and G15 versus G12) can be compared more directly. 
It is also clear that the mode and median are brought into 
better agreement, moving up towards i^FoF/i^mock = 1. 

Depending on the precise science goal the full scaling 
equations should be used. Particular cases would be in any 
comparison of extremely dissimilar groups over a large red- 
shift baseline. However, in small volume limited samples a 
simple median correction factor might be desirable. This is 
particularly true at small redshift where the asymptotic na- 
ture of the plane function used could produce spurious re- 
sults. 
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Figure 12. Comparison of the fidelity of the recovered group 
mass (x-axis) against the group luminosity (y-axis). The left panel 
uses only a global median correction for mass and luminosity, op- 
timized for A^FoF ^ 5 groups (i.e. Eq. [18] and [22] with A = 10.0 
and B = 1.04). The right panel uses the redshift and multiplicity 
dependent scaling functions of Eq. [19] and Eq. [23] with parame- 
ters listed in Tables [4] and [6] respectively. The green dashed lines 
show measurement accuracy factors of 2/5/10 for the mass and 
luminosity separately. The contours represent the regions contain- 
ing 10/50/90% of the data for three different magnitude limits, 
TAB ^ 19.0 (black), tab ^ 19.4 (red) and tab ^ 19.8 (blue). 



4.6 Quality of grouping 

The accuracy with which the galaxy composition of a group 
is recovered is a distinct issue, but nevertheless equally im- 
portant as the precise recovery of intrinsic group proper- 
ties, as considered in ^4.H - ^^^ For instance, even a group 
that has been perfectly recovered might produce an incorrect 
mass estimate, the latter depending on the exact observed 
configuration of galaxies on the sky and not solely on the 
group membership. Using Qtot, as defined by Eq.[T4]in ^3.1[ 
as our definition of grouping quality, we can investigate how 
different aspects of grouping affect the purity of the observed 
systems. 

Fig. [13] and Fig. [14] show how Qtot and Etot vary within 
different group subsets for best matching FoF/halo mock 
groups. The grouping optimisation was not done with the 
whole sample, rather only groups with NfoF ^ 5 con- 
tributed to the cost function. Hence panels that contain 
groups of lower multiplicity (i.e. 2 ^ NfoF ^ 4) did not 
drive the optimisation, but demonstrate the consequence of 
it. 

The parameter that best constrains the group quality 
is the multiplicity, where the spread in observed grouping 
quality reduces for higher multiplicity systems. The most 
accurate groups tend to be at redshifts z ^ 0.2 and have low 
multiplicities. This is to be expected since the global opti- 
misation considered will naturally be drawn to the regime 
where most groups are. That said, the bijective fraction of 
recovered groups is best for high multiplicity systems and 
remains very steady with redshift. The overall effect is that 
groups are more likely to be unambiguously discovered (i.e. 
bijective) when A^foF is high (middle panels of Fig. [14]) , while 
the quality of the groups is, on average, quite constant with 
A^FoF (middle panel of Fig.1131 Bijection and quality are obvi- 
ously related, and these results should be interpreted as low 
multiplicity groups possessing a large amount of scatter in 



the quality of grouping, meaning that they can be scattered 
below the quality limits required for a successful bijection 
even though the median quality is quite high. Higher mul- 
tiplicity systems possess less intrinsic scatter in the quality 
of grouping, meaning they are very rarely scattered below 
the bijection limits, and consequently the average bijection 
fraction remains higher. 

The exception to this is that the lowest mass groups 
appear to be the most accurately recovered, even though 
most observed have masses M - 10^2 /i-^ Mo . This can 
be understood when careful attention is paid to how the 
FoF algorithm constructs the groups. It creates upper limits 
for the allowed difference in either the radial (velocity) or 
tangential (physical) separation between galaxies. It must 
be the case that groups that are constructed from galaxies 
that are at the limit of the allowed separations will be larger 
in terms of projected radius and observed velocity dispersion 
than groups with galaxy separations well within these limits. 
This means they will have larger dynamical masses, and 
assuming interlopers are spread uniformly in space they will 
have a lower Qtot since they will cover a larger volume in 
redshift space, so be more likely to include interlopers. This 
is an interesting effect of the grouping, because although the 
masses measured are likely to be too small the actual groups 
are extremely secure. 

With this understandable effect in mind, different meth- 
ods for estimating the intrinsic Qtot using observed link- 
ing characteristics were investigated. The most successful 
proved to be calculating the following for each group: 



i=l Z^j = l 



tan 0[ ; 



bij (Aim,i + ^lim,j) 



A^links 



(24) 



where 6ij is unity if i and j are directly linked (and zero oth- 
erwise), while all other terms are as described in Eq.[T] Hence 
the sum is done over allowed links within the group (A^unks) 
which has a limit of A^fof(A^FoF — 1)- This statistic estimates 
how much closer than the allowed maximum separation all 
of the galaxies are on average, and when this number is 
large it indicates the group must be very compact in projec- 
tion relative to the allowed size. Fig. [15] demonstrates how 
i^proj correlates loosely with Qtot- Interestingly, the equiva- 
lent statistic measuring the radial linking shows very little 
correlation with Qtot- This means that outliers tend to fit 
quite comfortably in velocity space, but look anomalous in 
projection. To aid the selection of high-fidelity groups Lproj 
will be released along with the group catalogue. 



4.7 Sensitivity of grouping to mock catalogues 

So far in this work we have made the implicit assumptions 
that the mocks are to a large extent a good representation 
of the real Universe and that optimising the grouping algo- 
rithm to recover mock groups as accurately as possible will 
have the desired effect of also returning the best groups from 
the GAMA data. Clearly we should be wary of the effects of 
over-tuning our algorithm to the mocks, especially given the 
limitations listed in ^2.31 To better understand how sensitive 
our final group catalogue might be to certain intrinsic mock 
properties, three small variations affecting the redshift-space 
positions of the mock galaxy were implemented which lead 
to slight changes in the "observed" mocks. These perturba- 
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Figure 13. Total group quality (Qtot) as a function of group redshift (^^fof), group multiplicity (A^Fof) and group mass (Mfof)- Each 
panel present a specific subsample of groups, as indicated by the key. Solid lines represent the moving median for tab ^ 19-0 (black), 
TAB ^ 19-4 (red) and tab ^ 19.8 (blue) survey limits. Dashed (dotted) lines are for 25 and 75 (10 and 90) percentiles. Grey points show 
the Tab ^ 19.4 data. 



tions where applied to the r ^ 19.4 mock catalogues since 
that should be indicative of the impact we might expect. 
The modifications consist of: 

1) increasing all galaxy peculiar velocities along the line 
of sight by 10%, creating groups that are less compact in 
velocity space than the default mocks: mock+. 

2) reducing all galaxy peculiar velocities along the line of 
sight by 10%, creating groups that are more compact in 
velocity space than the default mocks: mock_. 



3) convolving all galaxy peculiar velocities along the line 
of sight with a Gaussian velocity distribution of width a — 
50kms~^, mimicking the GAMA velocity errors: mocker. 

The first two sets of mock, mock+ and mock_, test the 
sensitivity of the grouping to the fidelity in which small scale 
redshi ft space distorti ons are accounted for in the mocks. 
From (|Kim et "al]|2009l) (and Nor berg et al. in prep) we know 
that the lBower et alj (|2006l l semi-analytic galaxy formation 
model do not reproduce very accurately the redshift space 
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Figure 14. Bijective group fraction (Etot) as a function of group redshift (^^fof), group multiplicity (A^Fof) and group mass (Mpop)- 
Each panel present a specific subsample of groups, as indicated by the key. Solid lines represent the moving median for tab ^ 19-0 
(black), TAB ^ 19-4 (red) and tab ^ 19.8 (blue) survey limits. Dashed (dotted) lines are for 25 and 75 (10 and 90) percentiles. Grey 
points show the tab ^ 19.4 data. 



clustering on Mpc scales and smaller. By systematically 
modifying the peculiar velocities by ±10% and by keeping 
the same FoF grouping parameters we attempt to address 
this mismatch between data and mocks and measure how 
sensitive the grouping is such differences. From Norberg et 
al. (in prep) we expect that an additional velocity bias of 
~ +10% to the mock galaxies should be enough to reconcile 
the redshift space clustering of the mocks and the data. The 
third set, mock^, tests the sensitivity of the grouping to 



velocity errors, which were not considered in the nominal 
mocks described in ^2.3! but clearly present in the GAMA 
data. To fully simulate how we treat the errors for the real 
GAMA data the velocity errors are taken off in quadrature 
as described in Eq. 17. 

The FoF algorithm with the nominal parameters as 
listed in Table [1] is applied to the three sets of mocks. The 
FoF grouping of the standard and modified mocks result 
in pretty similar findings: The first impact these perturbed 
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Figure 15. Comparison of the observed linking strength Lproj 
with the intrinsic group quahty Qtot- The colour of each data 
point represents the group multiplicity, going from NpoF = 5 
(red) to NpoF — 200 (blue). The correlation is strongest for low 
multiplicity systems, which is important since it is these that 
can be pathologically bad. The black line is the linear regression 
fit to the entire data, so it predominantly describes the lower 
multiplicity systems. 

mocks might have on the grouping is on the group assign- 
ments themselves, so 5'tot was calculated for all 3 varieties of 
new mocks where the reference mock data is now the original 
mock light cone. This means we are only analysing how sim- 
ilar the new mock FoF groupings are to the original set, not 
to the "true" mock groupings. 5'tot is found to be ^ 0.97 for 
all three varieties of mock perturbation for A^haio ^ 2, and 
only drops slightly for A^haio ^ 20 which shows the greatest 
discrepancy. In this regime mock+ has 5'tot = 0.94, mock- 
has 5tot = 0.96 and mocker has 5tot = 0.93. 

For the estimated masses, it is obvious that mock_ and 
mock+ will require slightly different scaling relations to re- 
cover unbiased halo masses. The global mass scaling factor 
(where A^foF ^ 5) for mock_, A_, needs to be 11.6, so 16% 
larger than A, while needs to be ^ 8.7, so 15% smaller 
than A. This implies that we have an underlying system- 
atic uncertainty of at least 15% on all masses assuming we 
expect the true physics to vary the galaxy velocities at the 
10% level. Naively we might have expected the difference 
to be at the ~ 20% level since 1.1^ = 1.21, but the ran- 
dom nature of peculiar velocities and the slight variation in 
grouping conspires to reduce the variation. 

For mocker we require exactly the same global scaling 
relation as before, i.e. Aa- — A — 10.0. This implies that re- 
moving the velocity error in quadrature is the correct proce- 
dure, and means we certainly do not expect the uncertainty 
in radial velocities to have a significant effect on the implied 
masses. 

The implication for the group luminosities are, as ex- 
pected, very marginal w.r.t. these modifications of the 
mocks, which is a result of the grouping still being rather 
good for all three set of mocks (as evidenced by the marginal 
change in 5tot) despite the algorithm not being tuned to 
them. 



5 GLOBAL PROPERTIES OF G^Cvl 

Having run extensive optimisations and calculated refine- 
ments based on the mock catalogues, the algorithm was run 
over the real GAMA data. In total, taking the deepest ver- 
sion of each GAMA survey region possible, 14,388 groups 
were formed containing 44,186 galaxies out of 110,192 galax- 
ies in our volume limited selection, meaning 40% of all 
galaxies are assigned to a group. This is almost identical 
to the average grouping rate found in the mocks, also 40%. 

The headline group number statistics are listed in Ta- 
bleElfor each of the GAMA regions, i.e. G09, G12 and G15. 
Tab ^ 19.0 and tab ^ 19.4 catalogues were made for G09, 
G12 and G15, and an extra tab ^19.8 catalogue was cre- 
ated for G12 (the only region that has deep enough spec- 
troscopy) . This table also includes the expectation from the 
mocks with the minimum and maximum numbers of groups 
in the 9 GAMA lightcone mocks. Subsets that have numbers 
that are outside the min-max range of the mocks are flagged 
with an asterisk. 

From Table [71 we conclude that for most multiplicity 
ranges and survey limits the number of GAMA groups de- 
tected is very comparable to the predictions from the GAMA 
lightcones. Over the full GAMA lightcones G12 and G15 
are very close to the mean counts recovered from the mocks 
whilst G09, although very much at the underdense extreme, 
is not outside the min-max range expected. The compar- 
ison between data and mocks seems less favourable when 
splitting the groups by redshift and survey depth, where 
5 GAMA subsets lie outside of the min-max limits of the 
mocks. The difference becomes less and less significant the 
deeper the survey is and seems to be most significant in G09, 
which is underdense in all subsets investigated. 

It is well established that G09 is underdense below 
z < 0.2 compared to the whole of SDSS (Drive r et al]l201lh . 
whilst G12 and G15 are closer to the large scale average. 
Overall, this underdensity accounts for why we find fewer 
groups in G09. G09 is similar to the most underdense and 
group sparse GAMA area found in the mocks, suggesting it 
is a rare event in the mocks but at least not completely un- 
matched. G12 is most like the typical mock distribution, and 
the GAMA tab ^ 19.8 group catalogue is the most similar 
to the mocks of all catalogues. This catalogue tends to con- 
tain fewer large multiplicity groups than predicted by the 
mocks. These inconsistencies are not highly significant over- 
all, but they refl ect similar findings in the 2PIGG catalogue 
(|Eke et al.H2'003 l. 

Fig. 1161 shows the position of the GAMA groups in red- 
shift space projected onto the equatorial plane, with the 
symbol size reflecting the group multiplicity and colour the 
group velocity dispersion. The highest multiplicity groups 
are at lower redshift s, as should be expected in an appar- 
ent magnitude limited sample. This figure particularly high- 
lights the sample variance seen between regions, as already 
mentioned in the discussion of Table [71 There are vast re- 
gions of space that contain massive clusters and an assort- 
ment of groups, overlapping so tightly as to produce patches 
of solid colour in the plot. However, between these large fil- 
amentary regions there are voids that, whilst still possessing 
galaxies (in lower densities), barely contain a single signifi- 
cant group. At low redshift s {z < 0.1) where the mean galaxy 
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r ^ 19 
G09 


1.0 
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r ^ IS 
G09 


1.4 

G12 


G15 


Mocks (low, high) 


r ^ 19, 
G12 


.8 

Mocks ±cr (low, high) 


-^^group 2-4 
-^group 5-9 
A^group 10-19 
^group 20 + 


2051 
190 

45 
8* 


2409 
233 
55 
16 


2436 
234 
59 
16 


2334 (3154, 4100) 
253 (188, 294) 
66 (43, 82) 
26 (15, 39) 


3334 
329 
75 
17* 


3703 
395 

79 

26 


3776 
339 
102 
25 


3623 (3154, 4100) 
390 (322, 455) 
102 (69, 133) 
40 (20, 55) 


5687 
539 
121 
44 


5520 (4861, 6101) 
584 (509, 661) 
155 (98, 189) 
62 (34, 88) 


zgroup 0—0.1 
^group 0.1—0.2 
2group 0.2-0.3 
^group 0.3—0.5 


419 
973 
725 
178 


577 
1369 
640 
127 


512 
1450* 

633 
100* 


531 (318, 856) 
1144 (803, 1381) 
814 (606, 996) 
189 (125, 258) 


514 
1338 
1372 

531 


705 
1829 
1217 

452 


597 
1967* 
1198 

480 


634 (379, 1028) 
1552 (1076, 1841) 
1377 (1074, 1683) 
593 (421, 730) 


857 
2331 
1997 
1206 


746 (437, 1204) 
2024 (1424, 2424) 
2124 (1683, 2584) 
1426 (1044, 1708) 


Total 


2294 


2713 


2745 


2678 (2204, 3107) 


3755 


4203 


4242 


4156 (3578, 4728) 


6391 


6321 (5535, 7025) 



Table 7. Number of galaxy groups as a function of multiplicity, redshift and survey depth. The GAMA groups are split by GAMA 
regions, i.e. G09, G12 and G15. For the mocks, the mean number of groups between all 9 mock GAMA lightcones in a single GAMA 
region of ~ 48 deg^ is listed together with their low and high extreme across all mocks (within brackets). Samples with an asterisk are 
those which are outside the min-max range of the mocks. 



number density is the highest, such voids are still very evi- 
dent in the GAMA data. 

We still see groups of significant size (A^foF ^ 20) be- 
yond a redshift of 0.3 in G09, and there is evidence of fil- 
amentary structure in the under-lying galaxy population 
beyond z ^ 0.4 in G12 (G12 being 0.4 mags deeper than 
G09/G15 probes structure to slightly higher redshifts). In 
G12 there are a number of low multiplicity systems beyond 
a redshift of 0.4 — these groups appear to be associated with 
nodes in filamentary structure and have been visually iden- 
tified as large clusters. This means that GAMA is able to 
measure the evolution of group properties and filamentary 
structure over a redshift baseline of 0-0.5, which is 5 Gyrs, 
or 36% the lifetime of the Universe — an evolutionary time 
span for large scale structure analysis that is unprecedented 
in a single coherent survey. 

Fig. [17] shows a series of one degree wide declination 
slices in G12 that cover 0.15 ^ ^^group ^0.2. The black points 
show the location of individual galaxies, and as expected the 
groups closely trace overdensities seen in the galaxy distri- 
bution. Intriguingly, we see evidence of extremely fine fila- 
mentary structure that is not associated with any of the de- 
fined groups. If these structures were purely radial in direc- 
tion then they could be claimed as misidentified systems, for 
which the filamentary structure merely betrays the velocity 
dispersion along the line of sight. Instead we witness gentle 
sweeping arcs that move round steadily radially and in pro- 
jection, implying that they are real fine filamentary struc- 
ture that connects group nodes. This is probably one of the 
first times that one sees the galaxy distribution mimicking 
so closely the filamentary distribution which is so commonly 
seen in large Dark Matter dominated numerical simulations. 

The most striking of these filaments can be found in 
the top-right panel of Fig. [T71 where fine strands can be seen 
extending out from a ^ ISOdeg. and z ^ 0.18, and also from 
a ^ 182deg. and z ^ 0.19. In both of these cases it is possi- 
ble to identify group and cluster nodes that connect the fila- 
ments together, but there are no groups detected within the 
filaments themselves. It is important to highlight that with- 
out GAMA redshifts these regions would have previously 
been identified as void like, and that the additional galaxies 
are not randomly distributed 'field' galaxies, but appear to 
be in extremely well defined environments, but non- grouped 
w.r.t. the GAMA mean galaxy number density. 

After considering the spatial distribution of GAMA 
galaxy groups. Fig. 1181 shows the distributions of four basic 
properties of the GAMA galaxy group catalogue (G^Cvl): 



the observed group multiplicity, mass, velocity dispersion 
and radius distributions. We now discuss them in turn. 

The top left panel of Fig. [18] presents the distribution of 
group multiplicities for three survey depths (coloured solid 
lines) to be compared to the equivalent average mock multi- 
plicity distributions (dashed lines). Unsurprisingly the raw 
number of groups increases with survey depth explaining 
why the three coloured curves are ordered as a function of 
survey depth, i.e. tab ^ 19.0 (black), tab ^ 19.4 (red) and 
Tab ^ 19.8. More importantly, the number of high multi- 
plicity systems is significantly different between data and 
mocks, a result already discussed in Table [T] while their 
numbers are much more similar for low multiplicity sys- 
tems. The difference at the high multiplicity end is impor- 
tant and put key constraints on the galaxy formation model 
used. The group multiplicity distribution is mostly sensi- 
tive to the Halo Occupation Distribution (HOD), as for a 
given number of haloes the group multiplicity distribution 
is entirely depende nt on i ts HO D. A known feature of the 
GALFORM Bower 'eFaD (|2006l l galaxy formation model is 
its tendency to populate the more massive haloes w ith an 
excess of faint satellite galaxies fe.g. iKim et al.ir2009[ ). 

The top right panel of Fig. [18] presents the distribution 
of group masses for three survey depths (coloured soUd lines) 
to be compared to the equivalent average mass distributions 
from the mocks (dashed lines). For the comparison to be 
as fair as possible, the group masses used for the mocks 
is estimated in exactly the same way as the data. Because 
velocities uncertainties have not been included in the mocks 
it is essential to remove from this comparison all groups 
which velocity dispersion estimate is significantly affected by 
this uncertainty, as the group mass is proportional to (see 
Eq. [TH)) and would bias the distribution. To achieve this we 
simulated mock a groups with 80kms~^ velocity error and 
calculated the velocity dispersion at which more than 95% 
of the population should be robust to being scattered below 
the presumed GAMA group velocity error (which would give 
a corrected a of Okms~^). This velocity dispersion limit 
was found to be 130kms~^. Thus the top-right panel only 
shows a comparison of groups where this selection has been 
applied. 

The agreement between data and mocks beyond ^ 
10^^ Mq is remarkably good for all survey depths, with 
possibly only the normalisation that is slightly lower for 
GAMA data than for the mocks (however within the typ- 
ical scatter expected from sample variance). The relative 
profiles are all very similar. We note that this mass distri- 
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Figure 16. Redshift space position of GAMA galaxy groups projected onto the equatorial plane, split by survey area and with symbol 
size reflecting the group multiplicity and symbol colour the group velocity dispersion (see flgure keys for exact values). G09 and G15 
are for a survey depth of tab ^ 19.4, while G12 is for tab ^ 19.8, explaining why the number of groups detected at higher redshifts is 
larger in G12 compared to G09 and G15. At low redshifts where the projection effects are the smallest, groups are still visually strongly 
associated with the fllaments and nodes of the larger scale cosmic structure. Fewer groups are found beyond at higher redshift, a result 
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Figure 17. Four one degree wide declination slices of the GAM A G12 region covering the 0.15 < 2; < 0.20 redshift range. Declination 
increases from left to right and top to bottom, as indicated by the panel key. Galaxies are shown with black dots, and galaxy groups 
with the same symbols as in Fig. [161 



bution has been convolved with the error distribution on 
the group masses which have been estimated using a sin- 
gle correction factor {A — 10). This explains why unre- 
alistically large group masses are found (e.g. greater than 
10^^ M0). More detailed work on estimating the group 
masses is underway (Alpaslan in prep). 

The bottom left panel of Fig. [18] presents the distribu- 
tion of group velocity dispersions for three survey depths 
(coloured solid lines) to be compared to the equivalent aver- 
age group velocity dispersion distributions from the mocks 
(dashed lines). For the comparison to be as fair as possible, 
the velocity dispersion used for the mocks is estimated in 
exactly the same way as the data. Because velocities uncer- 
tainties have not been included in the mocks, it is essential to 
remove from this comparison all groups those for which the 
velocity dispersion estimate is significantly affected by this 
uncertainty. This can be straightforwardly done by ignoring 



groups with a ^ 130kms~^ (as discussed above). Beyond 
that limit in the velocity dispersion distribution, the data 
and mock distributions are very comparable, showing yet 
again how closely matched the mocks and the data are. For 
smaller velocity dispersion system a more careful modelling 
of the velocity errors (and hence velocity dispersion errors) 
is needed before any conclusions can be drawn on how ap- 
propriate the mocks are. Work is currently ongoing within 
GAMA to better understand the precise nature, and distri- 
bution, of the redshift velocity errors. A full comparison is 
deferred until these errors have been fully characterised. 

Finally, the bottom right panel of Fig. [18] presents 
the distribution of group radius for three survey depths 
(coloured solid lines) to be compared to the equivalent av- 
erage group radius distributions from the mocks (dashed 
lines). Considering the full sample of groups, the mocks and 
the data seem to be very comparable. 
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Figure 18. Global group properties of the GAMA galaxy group catalogue (G^Cvl) compared to the corresponding mock group catalogue: 
group multiplicity distribution (top left), dynamical group mass distribution limited to crpoF ^ 130 km s"-*^ (top right), group velocity 
dispersion distribution limited to ctfoF ^ 130 km s"-*^ (bottom left) and group radius distribution (bottom right). Solid (dashed) lines for 
GAMA (mock) for tab ^ 19-0 (black), tab ^ 19-4 (red) and tab ^ 19.8 survey limits. The denominator shown in the y-axis is the bin 
width applied, so numbers quoted are per the stated denominator. See text for discussion. 



To investigate in more detail where differences between 
the GAMA data and the mocks may reside we divided the 
mass, velocity dispersion and radius distributions into mul- 
tiplicity subsets (Fig. [T9|. For clarity, Fig. [19] only uses the 
^AB ^ 19.4 survey limit, the deepest limit appropriate for 
all GAMA regions. Also, mock distributions for each of the 
9 mock lightcones are shown with grey lines rather than 
the sample mean shown in Fig. [181 This makes allows us to 
see where the GAMA group distributions lie in the context 
of the full range of mock distributions, and therefore how 
significant the differences are as a function of each parame- 
ter. Plotting in this manner makes comparison much clearer 
than showing the error bars. The agreement is very good for 
2 ^ A^FoF ^ 4 for all three group properties plotted, however 



discrepancies are apparent for higher multiplicities both in 
normalisation and to a lesser extent in shape. 

For the mass distributions (top panel of Fig. [19]) it is 
clear that GAMA possesses a lower normalisation in counts 
compared to the mock groups, an effect that is more no- 
ticeable for larger multiplicities. The largest deviations in 
the shapes of the distribution are seen for MfoF ^ 10^^ Mq, 
where we see excess number counts for the mock groups. 
This difference is most evident for 5 ^ A^foF ^ 9. The most 
likely explanation for this low mass excess comes from the 
finding that mock groups are typically more compact than 
GAMA groups, which will naturally cause a lower estima- 
tion of the mass. The radial discrepancies are discussed in 
more detail below. 

The velocity dispersion (middle panel of Fig. [19]) only 
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shows strong evidence of a normalisation offset, where the 
agreement is excellent for low multiplicity systems but as 
this increases we find the GAMA groups have a general 
count deficit. Since the strength of the normalisation off- 
set varies with multiplicity the difference cannot be simply 
due to sample variance, where all multiplicity subsets would 
betray the same deficit. 

The differences between GAMA and the mocks is most 
pronounced for the group radius (bottom panel of Fig. [19}. 
The most significant deviations are seen where Radso ^ 
0.2/i-^Mpc: GAMA finds many fewer systems, and the ef- 
fect is much more significant for higher multiplicities where 
the mocks contain a significant excess of compact systems 
not seen at all in the data. At the GAMA median redshift 
(z ^ 2), 0.1 Mpc (comoving) radius corresponds to an 
angular separation of 25'' on the sky. Whilst the simplest 
explanation might be the GAMA survey suff ers from sig- 
nificant close pair incompleteness. Fig. 19 of [Driver et al.l 
suggests this not be the case: GAMA is better than 
95% complete for systems with up to 5 neighbours within 
40" (on the sky). These separations are much larger than 
the expected optical confusion limit (1-2"), so photometric 
bias (i.e. close pairs not being deblended) cannot explain the 
discrepancies we find. Since the main variance witnessed for 
velocity dispersions between the mocks and GAMA data is 
the normalisation, the more compact mock groups appear 
to be the origin of the low mass population we find in the 
top panels of Fig. \19\ 

The differences seen in Fig. [19] could well be due 
to limitations in the physics implemented in the GAL- 
FORM I Bower et al.l (|2Q06 ) semi-analytic galaxy formation 
model, where the exact distribution of galaxies within a halo 
depends on their dynamical friction timescale and which 
dark matter particle the galaxy was originally associated 
with. Despite the high numerical resolution of the Millen- 
nium simulation, the vast majority of the satellite galax- 
ies in the galaxy formation model are not resolved in sub- 
haloes, implying that their merging timescales are governed 
by an analytic calculation and their position is given by the 
most bound dark matter particle of their parent halo. A 
consequence of a too long merging timescale is an overabun- 
dance of galaxies at small distances away from the centre 
of the halo. This, together with the definition of group ra- 
dius adopted for this work (i.e. Radso), is the most likely 
explanation for the apparent excess of compact groups in 
the mocks compared to the data. This has the consequence 
of also creating a deficit of low mass groups in the GAMA 
data in comparison to the mocks since the dynamical masses 
are directly proportional to the group radius measured. 

In summary, the GAMA group catalogue (G^Cvl) and 
its mock counterpart are similar in many respects, but not 
all. In the discussion of Fig. [18] and Fig. [19] it has become 
clear that already G^Cvl is providing new constraints to 
the galaxy formation model used to construct the mocks 
and will be implemented in the next generation of mocks. 
Investigating the discrepancies between GAMA and mock 
group catalogues, and the impact this has on any measured 
HMF, is a complex and important task. A full analysis is de- 
ferred to a GAMA paper in preparation, which will present 
a more in depth analysis of a series of statistically equiva- 
lent mocks as well as galaxy formation based mocks as used 
here. Only with a large variety of mocks will it be possible 



to put realistic constraints on the underlying dark matter 
model. The analysis in the present paper is entirely limited 
to one family of mock realisations, which explains why the 
constraints from the GAMA groups are so far mostly lim- 
ited to possible constraints on the galaxy formation model 
rather than on the underlying dark matter physics. 



6 GROUP EXAMPLES 

For every group we create a rgb image as a i^AB-TAB-t^AB- 
band composite, along with visual diagnostics that allow 
interesting features to be easily identified. Example images 
are shown in Fig. [20] Fig. [21] and Fig. [22] and discussed 
hereafter. 

Fig. [20] highlights 4 cluster-scale groups extracted from 
the GAMA data. The top panel shows relatively low redshift 
clusters with high multiplicities, whilst the bottom panels 
are examples of low multiplicity groups that show evidence 
for a lot of associated galaxies that are fainter than the 
GAMA survey limits (shown by a dashed red line on the 
luminosity distribution plotted in each panel). All of these 
groups are quite circularly symmetric and concentrated to- 
wards the centre, both of which are indicators of a well viri- 
alised population of galaxies. 

Fig.[2T]shows groups at radically different stages of evo- 
lution. The top panels show examples of fossil groups with 
one exceptionally dominant BCG. In both cases only the 
BCG had a known redshift before GAMA, and the large 
peak in the redshift distribution suggests particularly strong 
radial linking — an indication that the grouping is reliable. 
The bottom panels show groups with very loose associa- 
tion in comparison. Both groups are quite massive (in the 
cluster regime) and have identifiable background galaxies, 
but neither exhibits a centrally concentrated distribution of 
galaxies or a dominant BCG. Both of these groups have 
a relatively uniform redshift distribution, showing none of 
the strong central peak seen for the fossil groups in the top 
panel. The bottom-right group in particular has a very fiat 
luminosity distribution and an extremely non- circular dis- 
tribution of galaxies. The most likely scenario is that this 
group has two distinct sub-structures (top and bottom) col- 
lapsing into each other, where the bottom structure is phys- 
ically nearer to us in space and thus exhibits a large extra 
component of recessional velocity towards the CoM. 

Fig. 1221 shows particularly pleasing examples of galaxy- 
galaxy merging/interactions. A natural outcome from the 
GAMA group catalogue is that nearly all possible close-pairs 
will be grouped (modulo a very small amount of incomplete- 
ness). Often these merging systems will be found in higher 
multiplicity systems, but here are examples of two member 
groups that exhibit evidence for mergers. The top-left and 
top-right panels show quite similar looking systems: a red 
(likely passive) galaxy interacting with a blue (late-type) 
galaxy. The top-left panel has larger tidal tails and more 
of the fiux is in the late- type system, suggesting it is at an 
early stage of the merging process. The top panels are ex- 
amples where the multi-pass nature of GAMA has overcome 
the problems of fibre collisions to give us redshifts for both 
galaxies in the merging system. The bottom panels show 
merging systems that are both too faint and too close to be 
obtainable with SDSS data. The bottom-left panel system 
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Figure 19. Distribution of GAMA and mock galaxy group mass (top panels), velocity (middle panels) and radius (bottom panels), for a 
survey depth of tab ^ 19.4. GAMA is shown in red while the mocks are in grey. Multiplicity subsets are as stated in each panel. For the 
mass and velocity panels the mocks and GAMA data are limited to cr ^ 130 km s"-*^, required to avoid the effects of velocity errors in the 
GAMA data biasing the results. For the mass and velocity plots the clearest differences are normalisation offsets, and for A^poF ^ 5 there 
is a clear tendency for GAMA groups to have smaller MpoF and crpoF for ct given multiplicity subset. The distributions are significantly 
different for compact systems (Radso ^ ^.2h~^ Mpc) with A^foF ^ 5, where GAMA groups are less compact in projection. This effect 
becomes more significant for higher multiplicity subsets. 



appears to be a triple merger system, where the blue galaxy 
to the right does not have GAMA redshift because it is too 
faint. The bottom-right panel shows two extremely faint and 
relatively ix-band bright galaxies merging — a tidal connec- 
tion can be seen between them. In both of these bottom 
panels the groups in question have extremely low velocity 



dispersions (^ 45kms ^) and very low implied dynamical 
masses (- IO^^H-^Mq). 

In such systems dynamical friction is acting in such a 
manner that the dynamical mass will likely not be a good 
indicator of the intrinsic halo mass, rather it highlights a sys- 
tem where the energy has been transferred from group scale 
kinematics (energy in galaxies) to galaxy scale kinematics 
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Figure 20. Top panels show two cluster scale groups confirmed spectroscopically. Bottom panels show low multiplicity groups with 
significant, possibly associated, background galaxies. The rgb image is a XAB-^AB-'^AB-band composite. The size of the circle marking 
group members scales with the rAB-band flux and its colour reflects the galaxy uab — ^AB colour. A galaxy redshifted w.r.t. the group 
median redshift has a red upwards pointing line which length scales with the velocity difference, while for a blueshifted one the line is 
blue and points downwards. The rings represent the 50*^, 68*^ and 100*^ percentiles of the radial galaxy distributions relative to the 
iterative group centre. The velocity PDF smoothed with a Gaussian kernel of width a = SOkms"-*^ (the typical GAMA velocity error) 
is shown on the left of each panel, where the group median is shown with a green dashed-line and the BCG with a black dashed line. 
The bottom plot presents the raw absolute tab magnitude distribution of the group, with the effective GAMA survey limit shown with 
a red dashed-line, the group median absolute magnitude with green, and the BCG absolute magnitude with black. 



(energy in the stars/ gas). Dynamical friction conspires to 
reduce the velocity difference and physical distance between 
merging galaxies, and since we use MfoF oc a^R this will 
also reduce the implied dynamical mass that we measure. 



6.1 GAMA group catalogues 

The generation of a group catalogue produces a myriad of 
outputs, most of which are not of interest to the typical 
user. To ease interpretation for the average user, a deliber- 
ately simplified set of outputs will be made available. For 



each GAMA region two tables are released. The first one 
is a two column link list that identifies which group every 
galaxy belongs. The second is a table of group properties 
with the most important attributes of each group. This in- 
cludes the group radius Radso , the velocity dispersion ctfoF , 
the implied dynamical mass. Other metrics related to each 
group are also calculated to aid the analysis and interpre- 
tation of individual grouping quality. As well as the Lproj 
linking quality discussed above, the kurtosis of the radial 
separation of all galaxies from the group centre is calcu- 
lated and the 'modality' of the system is also computed us- 
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Figure 21. Top panels are potential fossil- groups, where the BCG is at least 2 magnitudes brighter than the second ranked galaxy in 
the Tab -band (in the case of the top-right groups the second rank galaxy is nearer in magnitude than this, but it is separated a large 
distance in projection). Bottom panels show groups with complex in-fall structure. See Fig. [20] for figure description. 



ing (1 + skewness^)/(3 + kurtosis^). This will be 1/3 for a 
normal distribution and 0.555 for a uniform, and is a useful 
metric since it does not just provide information on how non- 
Gaussian the velocity profile of each system is — it also pro- 
vides information on the whether the velocity profile is more 
cusped or cored than a Gaussian distribution. Additionally, 
in a similar manner to how the local over-density was cal- 
culated in a comoving cylinder centred around each galaxy, 
the local relative density is calculated for each group. This is 
calculated using a comoving cylinder of radius 1.5 Mpc 
and total radial depth of 36 Mpc, and gives a measure 
of how isolated the groups are relative to much larger scale 
structure. 

Finally, as a separate but useful output from creating 
the GAM A galaxy group catalogue, a full pair catalogue 
will be released. This is a natural output of the galaxy- 
galaxy linking stage of the grouping algorithm, and in- 



cludes all pairs that are within a common velocity sepa- 
ration of 1000 km s~^ and a physical projected separation 
of 50/i~^kpc. This will be used within the team for work 
involving the study of galaxy pairs. 



7 CONCLUSIONS 

In this paper we have presented a new group catalogue based 
on the spectroscopic component of the GAMA survey. The 
FoF based grouping algorithm used has been extensively 
tested on semi-analytic derived mock catalogues, and has 
been designed to be extremely robust to the effects of out- 
liers and linking errors. The velocity dispersion and radius of 
the groups are median unbiased, even when allowing for the 
possibility of catastrophic grouping errors. Globally, 77% of 
the recovered FoF groups bijectively (unambiguously) match 
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Figure 22. Examples of ultra low-mass groups that are also excellent candidates for merging systems. The bottom plots are groups 
that are within the nominal SDSS tab ^ 17.77 limit, but one or both galaxies are missing from that survey due to fibre collisions. The 
bottom plots are groups that are both too faint and too close together to be present in a spectroscopic SDSS catalogue. See Fig. 1201 for 
figure description. 



a mock group, and 89% of all mock groups are bijectively 
recovered. The purity of all FoF groups is 80%, and for mock 
groups the equivalent figure is 73%. This suggests that the 
FoF algorithm is quite well balanced and does not have a 
strong preference to over-grouping or to conservatively re- 
covering just the strongly bound core of systems. 

The overall number of groups within from ^ z ^ 0.5 
is remarkably consistent between the mocks and real groups, 
and for the most part comfortably within the range expected 
given the large sample variance that can affect galaxy sur- 
veys such as GAM A. The histograms of raw group mul- 
tiplicity and dynamically estimated group mass show a 
large amount of agreement between the GAMA data and 
the mock catalogues for the most part. Discrepancies ap- 
pear at the high multiplicity end, where GAMA finds fewer 
high multiplicity systems than recovered from the mock vol- 



umes. A more in depth analysis of the discrepancies between 
GAMA and mock groups is deferred to a later paper, still 
in preparation. 

The showcase examples of a small number of GAMA 
groups highlight the parameter space that is now opened 
up, and demonstrate the advantages brought by having ex- 
tremely high spatial completeness. Accurate group dynamics 
and a full sample of close pairs will be of key importance for 
determining the Halo Mass Function in upcoming work, and 
for finding new constraints on the galaxy merger rate in the 
local Universe, two of the main goals of the GAMA survey. 

The G^C will be made publicly available on the GAMA 
website (http://www.gama-survey.org) as soon as the as- 
sociated redshift data are made available. Interested par- 
ties should contact the author at asgr@st-and.ac.uk if they 
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wish to make use of the group catalogue data before the full 
public release. 
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